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Executive Summary 


The purpose of this report is to describe the technical qualities of the 2018-2019 operational administration of the 
English language arts/literacy (ELA/L) and mathematics summative assessments in grades 3 through 8 and high 
school. Committees of educators, state education agency staff, and national experts led the work in the 
development of the summative assessments that are aligned to the Common Core State Standards (CCSS) and are 
intended to measure more complex skills like critical thinking, persuasive writing, and problem-solving. New 
Meridian assumes the responsibility for management of the summative assessments, as well as item development 
and forms construction. For the academic year 2018-2019, participating states and agencies included the District of 
Columbia, Department of Defense Education Activity, and Maryland. 


The ELA/L assessments focus on reading and comprehending a range of sufficiently complex texts independently and 
writing effectively when analyzing text. The ELA/L assessments contain literary and informational texts; each passage 
set has four to eight brief comprehension and vocabulary questions. ELA/L constructed-response items include three 
types of tasks: literary analysis, narrative writing, and research simulation. For each task, students are instructed to 
read one or more texts, answer several brief questions, and then write an essay based on the material they read. 


The mathematics assessments contain tasks that measure a combination of conceptual understanding, applications, 
skills, and procedures. Mathematics constructed-response items consist of tasks designed to assess a student’s 
ability to use mathematics to solve real-life problems. Some of the tasks require students to describe how they 
solved a problem, while other tasks measure conceptual understanding and ability to apply concepts by means of 
selected-response or technology-enhanced items. In addition, students are required to demonstrate their skills and 
knowledge by answering innovative selected-response and short-answer questions that measure concepts and skills. 


In both content areas, students also demonstrate their acquired skills and knowledge by answering selected- 
response items and fill-in-the-blank questions. Each assessment consists of multiple units, and additionally, one of 
the mathematics units is split into two sections: a non-calculator section and a calculator section. 


The summative assessments are designed to achieve several purposes. First, the tests are intended to provide 
evidence to determine whether students are on track for college- and career-readiness. Second, the tests are 
structured to access the full range of CCSS and measure the total breadth of student performance. Finally, the tests 
are designed to provide data to help inform classroom instruction, student interventions, and professional 
development. 


This technical report includes the following topics: 
e background and purpose of the assessments; 
e test development of items and forms; 
e test administration, security, and scoring; 
e classical item analyses and differential item functioning; 
e reliability and validity of scores; 
e item response theory (IRT) calibration and scaling; 
e performance level setting; 


e student characteristics; 


New Meridian February 28, 2020 Page 1 


2019 Technical Report 


e development of the score reporting scales and student performance; 
e student growth measures; and 


e = quality control procedures. 


The information provided in this technical report is intended for use by those who evaluate tests, interpret scores, or 
use test results in making educational decisions. It is assumed that the reader has technical knowledge of test 
construction and measurement procedures, as stated in Standards for Educational and Psychological Testing 
(American Educational Research Association [AERA], American Psychological Association [APA], and National Council 
on Measurement in Education [NCME], 2014). 
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Section 1: Introduction 


1.1 Background 


States associated with the Partnership for Assessment of Readiness for College and Careers (PARCC) came together 
in early 2010 with a shared vision of ensuring that all students—regardless of income, family background, or 
geography—have equal access to a world-class education that will prepare them for success after high school in 
college and/or careers. The goal was to develop new assessments that tie into more rigorous academic expectations 
and help prepare students for success in college and the workforce, as well as to provide information back to 
teachers and parents about where students are on their path to success. Calling on the expertise of thousands of 
teachers, higher education faculty, and other educators in multiple states, the resulting assessment system is a high- 
quality set of summative assessments, diagnostic assessments, formative tasks, and other support materials for 
teachers including professional development and communications tools. 


The partnership develops and administers next-generation assessments that, compared to traditional K-12 
assessments, more accurately measure student progress toward college and career readiness. The assessments are 
aligned to the Common Core State Standards (CCSS) and include both English language arts/literacy (ELA/L) 
assessments (grades 3 through 11) and mathematics assessments (grades 3 through 8 and high school). Compared 
to traditional standardized tests, these assessments are intended to measure more complex skills like critical 
thinking, persuasive writing, and problem-solving. 


In 2013, the PARCC Governing Board launched Parcc Inc., a nonprofit organization designed to support the 
successful delivery of the tests in 2014—2017, and the long-term success of the multi-state partnership. States 
continued to govern decisions about the assessment system; the nonprofit organization was their “agent” for 
overseeing the many vendors involved in the assessment system, coordinating the multiple work groups and 
committees (including Governing Board meetings), managing the intellectual property, overseeing the research 
agenda and the Technical Advisory Committee, and developing and launching the multiple non-summative tools. 


Summative assessments for the first operational administration were constructed in 2014. Eleven states including 
the District of Columbia participated in the first administration of the summative assessments during the 2014—2015 
school year. Six states, the Bureau of Indian Education, and District of Columbia participated in the second 
administration in school year 2015-2016. Five states, the Bureau of Indian Education, the Department of Defense 
Education Activity, and District of Columbia participated in the third administration in school year 2016-2017. Four 
states, the Bureau of Indian Education, the Department of Defense Education Activity, and the District of Columbia 
participated in the fourth administration in school year 2017-2018. 


Following the Parcc, Inc. contract ending in June 2017, participating states and agencies released the intellectual 
property (IP) of the contract to the Council of Chief State School Officers (CCSSO), and also contracted with New 
Meridian to manage the IP and provide item development, forms construction, and governance. Starting in August 
2017, New Meridian oversaw item development, data review for field test items, and test construction activities. 


New Meridian, in coordination with multiple states and vendors, developed an alternate form of the summative 
assessment to meet the needs for shorter testing times desired by several states. Through extensive research and 
guidance from the Technical Advisory Committee, the alternate blueprint was available in spring 2019 in addition to 
the original blueprint. New Meridian’s state-centric solution to educational assessment allowed states the flexibility 
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of selecting the assessment solution that best fit their specific needs. For the 2018-2019 academic year, 
participating states and agencies included the District of Columbia, Department of Defense Education Activity, and 
Maryland. 


The purpose of this technical report is to describe the operational administration of the summative assessments in 
the 2018-2019 academic year, including test form construction, test administration, item scoring, student 
characteristics, classical item analysis results, reliability results, evidence of validity, item response theory (IRT) 
calibrations and scaling, performance level setting procedure, growth measures, and quality control procedures. 


1.2 Purpose of the Operational Tests 


The summative assessments are designed to achieve several purposes. First, the assessments are intended to 
provide evidence to determine whether students are on track for college- and career-readiness. Second, the 
assessments are structured to access the full range of CCSS and measure the total breadth of student performance. 
Finally, the assessments are designed to provide data to help inform classroom instruction, student interventions, 
and professional development. 


1.3 Composition of Operational Tests 


Each operational test form is constructed to reflect the test blueprint in terms of content, standards measured, and 
item types. Sets of common items, included to provide data to support horizontal linking across test forms within a 
grade and content area, are proportionally representative of the operational test blueprint. The summative 
assessment is a mixed-format test. The current summative assessments are administered in either computer-based 
(CBT) or paper-based (PBT) format. 


The ELA/L assessments focus on reading and comprehending a range of sufficiently complex texts independently and 
writing effectively when analyzing text. The ELA/L assessments contain literary and informational texts; each passage 
set has four to eight brief comprehension and vocabulary questions. ELA/L constructed-response items include three 
types of tasks: literary analysis, narrative writing, and research simulation. For each task, students are instructed to 
read one or more texts, answer several brief questions, and then write an essay based on the material they read. 


The mathematics assessments contain tasks that measure a combination of conceptual understanding, applications, 
skills, and procedures. Mathematics constructed-response items consist of tasks designed to assess a student’s 
ability to use mathematics to solve real-life problems. Some of the tasks require students to describe how they 
solved a problem, while other tasks measure conceptual understanding and ability to apply concepts by means of 
selected-response or technology-enhanced items. In addition, students are required to demonstrate their skills and 
knowledge by answering innovative selected-response and short-answer questions that measure concepts and skills. 


In both content areas, students also demonstrate their acquired skills and knowledge by answering selected- 
response items and fill-in-the-blank questions. Each assessment consists of multiple units, and additionally, one of 
the mathematics units is split into two sections: a non-calculator section and a calculator section. 
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1.4 Intended Population 


The tests are intended for students taking ELA/L in grades 3 through 11, and/or mathematics in grades 3 through 8, 
as well as students taking high school mathematics (i.e., Algebra |, Geometry, Algebra II, and Integrated Mathematics 
I-III). For these students, the tests measure whether students are meeting state academic standards and mastering 
the knowledge and skills needed to progress in their K-12 education and beyond. 


1.5 Groups and Organizations Involved with the Summative Assessments 


New Meridian is a nonprofit organization that assumes the responsibility for management of the assessments, as 
well as item development and forms construction of the assessments. 


Committees of educators, state education agency staff, and national experts lead the work of the assessments. 
These committees include: 


e the Governing Board that makes major policy and operational decisions; 


e the Technical Advisory Committee that helps ensure all assessments will provide reliable results to inform 
valid instructional and accountability decisions; 


e the State Lead Council that coordinates all aspects of development of the summative assessment system 
and serves as the conduit to the Technical Advisory Committee and the Governing Board; and 


e ~—ELA/L, Mathematics, and Accessibility and Accommodation Features operational working groups. 


Pearson serves as the primary contractor for the operational administration and is responsible for producing all 
testing materials, packaging and distribution, receiving and scanning of materials, and scoring, as well as program 
management and customer service. In addition, test and item development activities are conducted by Pearson 
under the guidance and oversight of New Meridian. 


Pearson Psychometrics is responsible for all psychometric analyses of the operational test data. This includes 
classical item analyses, differential item functioning (DIF) analyses, item calibrations based on item response theory 
(IRT), scaling, and development of all conversion tables. 


Human Resources Research Organization (HumRRO) serves as a subcontractor and is responsible for replicating item 
calibrations based on item response theory (IRT), scaling, and development of all conversion tables. 


Pearson Psychometrics is also responsible for reviewing and comparing the results obtained independently from 
Pearson and from HumRRO including IRT calibrations, conversion tables, summative and claim scale scores, 
performance level classifications, and subclaim performance level classifications. 


1.6 Overview of the Technical Report 


This report begins by providing explanations of the test form construction process, test administration, and scoring 
of the test items. Subsequent sections of the report present descriptions of student characteristics, results of 
classical item analyses, item response theory (IRT) calibrations and scaling, performance level setting procedure, 
quality control procedures, results of students’ scale score analyses, results of reliability analyses, evidence of 
validity, and measures of student growth. 
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The technical report contains the following sections: 


Section 2 - Test Development 
This section describes the test design and the procedures followed during the development of operational test 
forms. 


Section 3 - Test Administration 
This section presents the operational administration schedule, information regarding test security and 


confidentiality, accessibility features and accommodations, and testing irregularities and security breaches. 


Section 4 - Item Scoring 
The key-based and rule-based processes for machine-scored items, as well as the training and monitoring processes 


for human-scored items, are provided in this section. 


Section 5 - Classical Item Analysis 
The classical item-level statistics calculated for the operational test data, the flagging criteria used to identify items 


that performed differently than expected, and the results of these analyses are presented in this section. 


Section 6 - Differential Item Functioning 
In this section, the methods for conducting differential item functioning analyses as well as corresponding flagging 


criteria are described. This is followed by definitions of the comparison groups and subsequent results for the 
comparison groups. 


Section 7 - IRT Calibration and Scaling 
This section presents the information related to the calibration and scaling of item response data including: data 


preparation, the calibration process, model fit evaluation, and items excluded from score reporting. In addition, the 
scaling process is described and evaluated. 


Section 8 - Performance Level Setting 
Performance levels and policy definitions, as well as the processes followed to establish performance level 


thresholds, are described in this section. 


Section 9 - Quality Control Procedures 
All aspects of quality control are presented in this section. These activities range from quality assurance of item 


banking, test form construction, and all testing materials to quality control of scanning, image editing, and scoring. 
This is followed by a detailed description of the steps taken to ensure that all psychometric analyses were of the 
highest quality. 


Section 10 - Operational Test Forms 
This section describes the operational test forms including high level blueprints for the assessments. 


Section 11 - Student Characteristics 


This section describes the composition of test forms, rules for inclusion of students in analyses, distributions of 
students by grade, mode, and gender, and distributions of demographic variables of interest. 
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Section 12 - Scale Scores 
This section provides an overview of the claims and subclaims, describes the development of the reporting scales 


and conversion tables, and presents scale score distributions. Finally, information regarding the interpretation of 
claim scores and subclaim scores is presented. 


Section 13 - Reliability 
The results of internal consistency reliability analyses and corresponding standard errors of measurement, for each 


grade, content area, and mode (CBT or PBT) for all students, and for subgroups of interest, is provided in this 
section. This is followed by reliability results for subscores and reliability of classification (i.e., decision accuracy and 
decision consistency). Finally, expectations and results for inter-rater agreement for handscored items are 
summarized. 


Section 14 - Validity 
Validity evidence based on analyses of the internal structure of the tests is provided in this section. Correlations 


between subscores are reported by grade, content area, and mode (CBT or PBT) for all students. 


Section 15 - Student Growth Measures 
This section provides details on student growth percentiles (SGP). Information about the model, model fit, and SGP 


averages at the overall level for all students, and for subgroups of interest, are provided in this section. 


References 


Appendices 
To facilitate utility, tables in the appendices are numbered sequentially according to the section represented by the 


tables. For example, the first appendix table for Section 6 is numbered A.6.1, the second appendix table for Section 6 
is numbered A.6.2, and so on. 


Addendum 


The addendum presents the results of analyses for the fall operational administration. These results are reported 
separately from the spring results because fall testing involved a nonrepresentative subset of students testing only 
ELA/L grades 9, 10, and 11, as well as Algebra |, Geometry, and Algebra II. 


To organize the addendum, tables are numbered sequentially according to the section represented by the tables. 
For example, the first addendum table for Section 11 is numbered ADD.11.1, the second addendum table for Section 
11 is numbered ADD.11.2, and so on. 
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Abbreviation/Acronym 


Definition 


1PL/PC One-parameter/Partial Credit Model 

2PL/GPC Two-parameter Logistic/Generalized Partial Credit Model 
3PL/GPC Three-parameter Logistic/Generalized Partial Credit Model 
Al Algebra | 

A2 Algebra II 

AAF Accessibility, Accommodations, and Fairness 

ABBI Assessment Banking for Building and Interoperability 
AERA American Educational Research Association 

AIS Average Item Score 

AlQ Assessment and Information Quality 

AmerlIndian American Indian/Alaska Native 

APA American Psychological Association 

ASC Additional and Supporting Content (Mathematics) 
ASL American Sign Language 

ATA Automatic Test Assembler 

CBT Computer-Based Test 

CCSS Common Core State Standards 

CDQ Customer Data Quality 

CSEM Conditional Standard Error of Measurement 

DIF Differential Item Functioning 

DPL Digital Production Line 

DPP Digital Pre-press 

EcnDis Economically disadvantaged 

EBSS Evidence-based Standard Setting 

ELA/L English Language Arts/Literacy 

EL English Learners 

ELN Not an English learner 

ELY English Learners 

EOC End-of-Course 

EOY End-of-Year 

ePEN2 Electronic Performance Evaluation Network second generation 
ESEA Elementary and Secondary Education Act 

FRL Free or Reduced-price Lunch 

FS Full Summative 

FT Field Test 

GO Geometry 

HOSS Highest Obtainable Scale Score 

IA Item Analysis 

ICC Item Characteristic Curve 

IDEA Individuals with Disabilities Education Act 

IEP Individualized Education Program 

INF Information Curve 

IP Intellectual Property 
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Abbreviation/Acronym 


Definition 


IRA Inter-rater Agreement 

IRF Item Response File 

IRT Item Response Theory 

ISR Individual Student Report 

K-12 Kindergarten to Grade 12 

LEA Local Education Agency 

LID Local Item Dependence 

LOSS Lowest Obtainable Scale Score 

LP Large Print 

M1 Integrated Mathematics | 

M2 Integrated Mathematics II 

M3 Integrated Mathematics III 

MAD Mean Absolute Difference 

MC Major Content (Mathematics) 

MH Mantel-Haenszel 

MP Modeling Practice (Mathematics) 
MR Mathematical Reasoning 

Multiracial Multiple Races Selected 

NAEP National Assessment of Educational Progress 
NCLB No Child Left Behind 

NCME National Council on Measurement in Education 
NoEcnDis Not economically disadvantaged 
NSLP National School Lunch Program 

OE responses Open-ended responses 

OMR Optical Mark Reading 

OWG Operational Working Group 

Pacific Islander Native Hawaiian or Pacific Islander 
PARCC Partnership for Assessment of Readiness for College and Careers 
PBA Performance-Based Assessment 
PBT Paper-Based Test 

PCR Prose Constructed Response (ELA/L) 
PEJ Postsecondary Educators’ Judgment 
PLD Performance Level Descriptor 

PLS Performance Level Setting 

PV Product Validation 

QA Quality Assurance 

RD Reading (ELA/L) 

RI Reading Information (ELA/L) 

RL Reading Literature (ELA/L) 

RMSD Root Mean Square Difference 

RV Reading Vocabulary (ELA/L) 

RST Raw-score-to-theta 

SD Standard Deviation 

SDF Student Data File 

SE Standard Error 

SEJ Standard Error of Judgment 
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Abbreviation/Acronym 


Definition 


SEM Standard Error of Measurement 

SIRB Scored Item Response Block 

SMD Standardized Mean Difference 

SSMC Single Select Multiple Choice 

SWD Students with Disabilities 

SWDN Not student with disability 

SWDY Students with Disabilities 

TCC Test Characteristic Curve 

TTS Text to Speech 

UIN Unique Item Number 

WE Writing Written Expression (ELA/L) 
WKL Writing Knowledge Language and Conventions (ELA/L) 
WLS Weighted Least Squares 

WR Writing (ELA/L) 

WRMSD Weighted Root Mean Square Difference 
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Section 2: Test Development 


2.1 Overview of the Summative Assessments, Claims, and Design 


Aligned to the Common Core State Standards (CCSS) as articulated in the Model Content Frameworks, the 
summative assessments are designed to determine whether students are college- and career-ready or on track, 
assess the full range of the CCSS, measure the full range of student performance, and provide data to help inform 
instruction, interventions, and professional development. Test development is an ongoing process involving 
educators, researchers, psychometricians, subject matter professionals, and assessment experts who participate in 
the development of the test design and its underlying foundational documents; develop and review passages and 
items used to build the summative assessments; monitor the program for quality, accessibility, and fairness for all 
students; and construct, review, and score the assessments. 


The summative assessments include both English language arts/literacy (ELA/L) and mathematics assessments in 
grades 3 through 8 and high school. The high school mathematics tests include traditional mathematics and 
integrated mathematics course pathways. Assessments contain selected response, brief and extended constructed 
response, technology-enabled and technology-enhanced items (TEI), as well as performance tasks. Technology- 
enabled items are single-response or constructed-response items that involve some type of digital stimulus or open- 
ended response box with which the students engage in answering questions. Technology-enhanced items involve 
specialized student interactions for collecting performance data. In other words, the act of performing the task is the 
way in which data is collected. Students may be asked, among other interactions, to categorize information, organize 
or classify data, order a series of events, plot data, generate equations, highlight text, or fill in a blank. One example 
of a TEl is an interaction in which students are asked to drag response options onto a Venn diagram to show the 
relationship among ideas. 


The summative assessments offer a wide range of accessibility features for all students and accommodations for 
students with disabilities (e.g., screen reader, assistive technology, braille, large print [LP], text-to-speech [TTS], and 
American Sign Language [ASL] video versions of the test, as well as response accommodations that allow students to 
respond to test items using different formats). For English learners who are native Spanish speakers, participating 
states and agencies offer the mathematics assessments in Spanish, and both LP and TTS versions of the test in 
Spanish (refer to the Accessibility Features and Accommodations Manual for in-depth information). 


2.1.1 English Language Arts/Literacy (ELA/L) Assessments—Claims and Subclaims 


The ELA/L summative assessment at each grade level consists of three task types: literary analysis, research 
simulation, and narrative writing. For each performance-based task, students are asked to read or view one or more 
texts, answer comprehension and vocabulary questions, and write an extended response that requires them to draw 
evidence from the text(s). The summative assessment also contains literary and informational reading passages with 
comprehension and vocabulary questions. 


The claim structure, grounded in the CCSS, undergirds the design and development of the ELA/L summative 
assessments. 


Master Claim. The master claim is the overall performance goal for the ELA/L Summative Assessment System— 
students must demonstrate that they are college- and career-ready or on track to readiness as demonstrated 
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through reading and comprehending of grade-level texts of appropriate complexity and writing effectively when 
using and/or analyzing sources. 


Major Claims: 1) reading and comprehending a range of sufficiently complex texts independently, and 2) writing 
effectively when using and/or analyzing sources. 


Subclaims: The subclaims further explicate what is measured on the summative assessments and include claims 
about student performance on the standards and evidences outlined in the evidence tables for reading and writing 
(refer to the test specifications documents). The claims and evidences are grouped into the following categories: 


Vocabulary Interpretation and Use 
Reading Literature 
Reading Informational Text 


Written Expression 


aS oN 


Knowledge of Language and Conventions 


2.1.2 Mathematics Assessments—Claims and Subclaims 


The summative mathematics assessment at each grade level includes both short- and extended-response questions 
focused on applying skills and concepts to solve problems that require demonstration of the mathematical practices 
from the CCSS with a focus on modeling and reasoning with precision. The assessments also include performance- 
based short-answer questions focused on conceptual understanding, procedural skills, and application. 


The claim structure, grounded in the CCSS, undergirds the design and development of the summative assessments. 


Master Claim. The degree to which a student is college- or career-ready or on track to being ready in mathematics. 
The student solves grade-level/course-level problems aligned to the Standards for Mathematical Content with 
connections to the Standards for Mathematical Practice. 


Subclaims: The subclaims further explicate what is measured on the summative assessments and include claims 
about student performance on the standards and evidences outlined in the evidence statement tables for 
mathematics (refer to the test specifications documents). The claims and evidence are grouped into the following 
categories. 


Subclaim A: Major Content with Connections to Practices 
Subclaim B: Additional and Supporting Content with Connections to Practices 


Subclaim C: Highlighted Practices with Connections to Content: Expressing mathematical reasoning by constructing 
viable arguments, critiquing the reasoning of others, and/or attending to precision when making mathematical 
statements 


Subclaim D: Highlighted Practice with Connections to Content: Modeling/Application by solving real-world problems 
by applying knowledge and skills articulated in the standards 
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2.2 Test Development Activities 


Test development activities began with the standards and model content frameworks. From these, more than 2,000 
educators, researchers, and psychometricians have developed the test specifications documents that guide the 
development of test items and the composition of the tests. These documents include the College- and Career- 
Ready Determinations and Performance-Level Descriptions, Claim Structure, Evidence Statement Tables, Blueprints, 
Informational Guides, Passage Selection Guidelines, Mathematics Sequencing Guidelines, Task Generation Models, 
Fairness and Sensitivity Guidelines, Text Selection Guidelines, and the Style Guide. Refer to the website for further 
information about these documents. 


2.2.1 Item Development Process 


Test and item development activities were conducted by Pearson under the guidance and oversight of the K-12 
state leads, the Higher Education Leadership Team, the Technical Advisory Committee, the Operational Working 
Group (OWG) members from each of the member states, the Text and Content Item Review Committees, and staff 
members from New Meridian, the project manager. 


Developing high quality assessment content with authentic stimuli for computer-based tests (CBT) and paper-based 
tests (PBT) measuring rigorous standards is a complex process involving the services of many experts including 
assessment designers, psychometricians, managers, trainers, content providers, content experts, editors, artists, 
programmers, technicians, human scorers, advisors, and members of the OWGs. 


Bank Analysis and Item Development Plan 
The summative item bank houses passages and items at each assessed grade level and subject. The bank supports 


the administration of the assessments, along with item release and practice tests. Items are developed and field 
tested annually. Prior to the annual item development cycle, the item development teams, in conjunction with 
members of the OWGs for ELA/L and mathematics, evaluated the strengths of the bank and considered the needs 
for future tests to establish an item development plan. 


Text Selection for ELA/L 
Using the Passage Selection Guidelines, English language arts subject matter experts were trained to search for 


appropriate passages to support an annual pool of passages for consideration. Guided by the test specifications 
documents, Pearson recruited, trained, and managed the contracted subject matter experts to deliver the number 
of texts specified in the annual asset development plan. The Passage Selection Guidelines provided a text complexity 
framework and guidance on selecting a variety of text types and passages that allow for a range of 
standards/evidences to be demonstrated to meet the assessment claims. ELA/L tests are based on authentic texts, 
including multi-media stimulus. Authentic texts are grade-appropriate texts that are not developed for the purposes 
of the assessment or to achieve a particular readability metric, but reflect the original language of the authors. 
Pearson content experts reviewed the passages for adherence to the Passage Selection Guidelines to meet the 
annual asset development plan described above in the number and distribution of genres and topics prior to review 
and consideration by the Text Review Committee. ELA/L item development was not conducted until after texts were 
approved by the Text Review Committee. 


Item Development 


Guided by foundational documents, Pearson recruited and trained the item writers and managed the item writing to 
develop the number of items specified in the annual asset development plan. Prior to further committee reviews, 
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the assessment teams at Pearson reviewed the items for content accuracy, alignment to the standards, range of 
difficulty, adherence to universal design principles (which maximize the participation of the widest possible range of 
students), bias and sensitivity, and copy editing to enable the accurate measurement of the standards. 


2.2.2 Item and Text Review Committees 


Members of the OWGs for ELA/L and mathematics, state-level experts, local educators, post-secondary faculty, and 
community members conducted rigorous reviews of every item and passage being developed for the summative 
assessment system to ensure all test items are of the highest quality, aligned to the standards, and fair for all 
student populations. All reviewers were nominated by their state education agency. The purpose of the educator 
reviews was to provide feedback to Pearson and participating states and agencies on the quality, accuracy, 
alignment, and appropriateness of the test passages and items developed annually for the summative assessments. 
The meetings were conducted either in person or virtually and included large group training on the expectations and 
processes of each meeting, followed by breakout meetings of grade/subject working committees where additional 
training was provided. 


Text Review 
The Text Review is a review and approval by the Text Review Committee of the texts eligible for item development. 


Participants reviewed and provided feedback to Pearson and participating states and agencies about the grade-level 
appropriateness, content, and potential bias concerns, and reached consensus about which texts would move 
forward for development. The Text Review Committee was made up of members of both Content Item Review and 
Bias and Sensitivity Review Committees. 


Content Item Review 
During Content Item Review, committees reviewed and edited test items for adherence to the foundational 


documents, basic universal design principles, Accessibility Guidelines, associated item metadata, and the Style 
Guide. Committees accessed the item content within the Pearson Assessment Banking for Building and 
Interoperability (ABBI) system that previews how the passages and items will be displayed in an operational online 
environment. Committees also verified that the appropriate scoring rule had been applied to each item. The Content 
Item Review Committees were made up of OWG members and educators nominated by participating states. 


Bias and Sensitivity Review 
Educators and community members make up the committee that reviews items and tasks to confirm that there are 


no bias or sensitivity issues that would interfere with a student’s ability to achieve his or her best performance. The 
committee reviewed items and tasks to evaluate adherence to the Fairness and Sensitivity Guidelines, and to ensure 
that items and tasks do not unfairly advantage or disadvantage one student or group of students over another. Bias 
and Sensitivity Committee members made edits and modifications to items and passages to eliminate sources of 
bias and improve accessibility for all students. 


Editorial Review 
The Editorial Review Committee consists of editors who reviewed up to 10 percent of the items and tasks. The 


committee reviewed the items for grammar, punctuation, clarity, and adherence to the Style Guide. 


Data Review 
Following the field test, educator and bias committee members met to evaluate test items and associated 


performance data with regard to appropriateness, level of difficulty, and potential gender, ethnic, or other bias, then 
recommended acceptance or rejection of each field-test item for inclusion on an operational assessment. The Data 
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Review Committee also made recommendations that items be revised and re-field tested. Items that were approved 
by the committee are eligible for use on operational summative assessments. 


2.2.3 Operational Test Construction 


Under the guidance in the operational test form creation specifications, Pearson constructed the operational forms 
to adhere to the test blueprints and the assessment goals outlined in the form creation specifications. These goals 


were: 


e@ test forms designed to measure well across the full range of student ability; 
@ scores that are comparable among forms and across test administrations; 
e scales that support classification of students into performance levels; 

@ maximization of the number of parallel forms; 

@ = minimization of overexposure of items; and 


e@ adherence to standards for validity, reliability, and fairness (Standards for Educational and Psychological 
Testing, AERA, APA, & NCME, 2014). 


Each content-area and grade-level assessment was based on a specific test blueprint that guided how each test was 
built. Test blueprints determined the range and distribution of content, and the distribution of points across the 
subclaims and task types. 


Multiple core forms were constructed for a given assessment to enhance test security and to support opportunity 
for item release. Core forms were the operational test forms consisting of only those items that counted toward a 
student’s score. These forms were designed to facilitate psychometric equating through a common item linking 


|’ 


strategy and to be constructed as “parallel” as possible from a content and test-taking experience. Evaluation criteria 
for parallelism included adherence to blueprint; sequencing of content across the forms; statistical averages and 
distributions for difficulty (e.g., p-value) and discrimination (e.g., polyserial correlation); item type and cognitive 


complexity; and passage characteristics for ELA/L including genre, topics, word count, and text complexity. 


Additionally, appropriate forms were identified as accessibility and accommodated forms. The forms are 
accommodated to support braille, large print, human reader/human signers, assistive technology, text-to-speech, 
closed captioning, and Spanish. Human reader/human signers and Spanish are provided for mathematics 
assessments only. Closed captioning is provided for ELA/L assessments only. 


Test Construction Activities 
After the data review meetings and prior to the test construction meetings, Pearson assessment specialists 


constructed initial versions of all the core forms. Content specialists constructed the initial core forms based on the 
support documents and specific processes to achieve fair parallel forms. The following steps were used to construct 
the operational core forms taken to the Test Construction Committee for review. 


1. constructed the online forms to match the blueprint and test construction specifications 
2. constructed the paper forms to match the blueprint and test construction specifications 


3. constructed accommodated and accessibility forms to match the blueprint, test construction specifications, 
and Accessibility, Accommodations, and Fairness (AAF) constraints 
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The test construction process included iterative steps between content specialists and psychometricians. Custom 
test construction reports generated by the Pearson psychometric team provided information on adherence to 
blueprint and statistical averages/distributions of item difficulty and discrimination describing the forms and 
allowing comparison of the forms. These reports facilitated content changes to better achieve the test construction 
goals. Equating across operational forms within an administration was accomplished by repeating core items across 
forms. Linking across administrations for operational forms was accomplished by including prior operational items 
on the current operational test forms. 


Pearson assessment specialists identified forms for each grade/subject suitable for use as the accommodated forms. 
Pearson psychometrics reviewed the psychometric properties of each of the accommodated forms with respect to 
the required criteria. The content of these forms was also reviewed by Pearson accessibility specialists allowing for 
content changes prior to the Test Construction Committee meetings. 


These test construction activities provided significant inputs to commence the meetings including: 
e the proposed items for the initial operational core forms and the accommodated forms described above 
e reports describing each form and comparing parallel forms 
e recommended accommodated forms 


Test Construction Meeting to Review Test Construction Inputs 
Members of the Content Item Review Committees and the AAF OWG participated in the building of operational core 


forms that met the summative assessment requirements. In that process, they met in an in-person meeting to 
review and make recommendations for changes so that test forms conformed to both the content and psychometric 
requirements of the assessment. 


Accommodated Form Review Process 
In addition to participating in many of the development activities including the Text Review and the Bias and 


Sensitivity Review meetings, the AAF OWG reviewed the proposed accommodated forms at the Test Construction 
Committee meeting for accessibility to make sure that the content can be accommodated for students with 
disabilities and English learners without changing the underlying measured construct. 


Forms were identified to support the following accommodations: 


Accommodated Base 1 


e Spanish paper (also serves Spanish LP, Soanish human reader paper) 
e §=Spanish human reader/human signer online 

e base accommodated paper (serves braille, LP, human reader paper) 
e human reader/human signer online 

e assistive technology screen reader 

e assistive technology non-screen reader 


e American Sign Language (ASL) 


Accommodated Base 2 


e = closed captioning 
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e text-to-speech first form 
e = Spanish online 


e Spanish text-to-speech 


Accommodated Base 3 (mathematics only) 


e text-to-speech second form 


Spanish is mathematics only. Closed captioning is ELA/L only. 


At the conclusion of the meetings, all test forms were constructed to meet test blueprints and requirements, and if 
necessary, reflect the operational linking design. Each test form reflected the test blueprint in terms of content, item 
types, and test length, as well as expected difficulty and performance along the ability continuum. Linking sets were 
proportionally representative of the operational test blueprint. The operational core forms, linking set forms, and 
field-test forms were reviewed by the Forms Review Committees and approved prior to the test administration. 


Spanish-Language Assessments for Mathematics 
For English learners, the mathematics assessments are offered in Spanish, as well as in Spanish-language large print 


and text-to-speech (TTS) versions. Once the operational form was approved, the form was sent to Pearson’s 
subcontractor, Teneo, for transadaption of the items. Transadaption differs from translation in that it takes into 
consideration the grade-level appropriateness of the words, as well as the linguistic and cultural differences that 
exist between speakers of two different languages. Accounting for these differences allows the item to measure the 
achievement of Spanish language speakers in the same way that the original version of the item does for native 
speakers of English. The Spanish Glossary provided guidance to the translator conducting the transadaption in grade- 
level and culturally appropriate ways of transadapting the items. For the Spanish language TTS form, the alternate 
text (used for description and/or text in art and graphics) was transadapted from the alternate text for the English 
language version of the TTS form. Phonetic mark-up, which guides how the TTS reader pronounces content-specific 
words and phrases, was also applied in this process. 


In addition to the expert review of potential content for all accommodated forms conducted by the AAF OWG with 
assistance from content experts at the test construction meetings, the transadapted forms underwent additional 
quality checks: a Pearson Spanish copy edit services review and approval, and an AAF OWG review and approval. 


2.2.4 Linking Design of the Operational Test 


To support the goal of score comparability within and across administrations and years, a hybrid approach was 
implemented that incorporated the strengths of common item linking and randomly equivalent groups. The use of 
repeated operational core items was leveraged for common item linking. In addition, all forms were available 
throughout the operational administration, with spiraling at the student level, leveraged to support linking through 
randomly equivalent groups. 


The operational test forms involved various types of linking; horizontal linking and across-administration linking. 
Horizontal linking consisted of linking items, or common items, included in both forms in a single administration. 
Across-administration linking, or year-to-year linking, consisted of common items included in two different 
administrations. The placement of linking items across forms or administrations supports the development of 
comparable scores. 
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Linking item sets can be internal or external linking sets. Internal linking sets consist of common items in operational 
positions such that the items contribute to the students’ scores. External linking sets consist of common items in 
positions resulting in the items not contributing to students’ scores. The current linking designs included internal 
linking sets. 


2.2.5 Field Test Data Collection Overview 


Field-test items were embedded in the spring operational forms to collect data for psychometric analysis necessary 
to support the assessment system for future administrations. Field-test administration entailed paper and computer 
administration modes, with computer administration as the dominant mode. The ELA/L unit of field-test items were 
administered to a sample of students. 


Field-test sets were constructed to balance the expected cognitive load and difficulty across forms, reflected in the 
number of points, distribution of task types, and balance of passages for ELA/L. Forms for each content area were 
spiraled at the student level. The data collection design entailed three conditions. Condition 1, which comprised the 
mathematics assessment, was an embedded census field-test model in which all students taking the summative 
assessment participated in the field test. 


Under Condition 2, which comprised the ELA/L assessment, approximately one-third of the schools were sampled 
across some of the participating states. Students in the sampled schools or districts took forms containing ELA/L 
embedded field-test tasks. Schools or districts were selected so that the sample for each ELA/L assessment was 
representative of the general testing populations in terms of achievement (i.e., average scale score and percentage 
of students at Level 4 and Level 5 in the previous year) and demographics (i.e., ethnicity composition, percentage of 
economically disadvantaged, English learners, and students with disabilities). The sampling plan was created such 
that if a given school was part of the ELA/L field test one year (e.g., spring 2017), it would not be required to 
participate in the field test for the subsequent two years (e.g., spring 2018 and spring 2019). 


For Condition 3, states or agencies may select to field-test two ELA/L grade levels rather than all grade levels. The 
grade levels selected participate in a census field-test where all students are administered the embedded field-test 
items. The remaining grade levels do not participate in field-testing. The selected grade levels are rotated across 
years. 
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Section 3: Test Administration 


3.1 Test Security and Administration Policies 


The administration of the summative assessment is a secure testing event. Maintaining the security of test materials 
before, during, and after the test administration is crucial to obtaining valid and reliable results. School Test 
Coordinators are responsible for ensuring that all personnel with authorized access to secure materials are trained in 
and subsequently act in accordance with all security requirements. 


School Test Coordinators must implement chain-of-custody requirements for specified materials. School Test 
Coordinators are responsible for distributing materials to Test Administrators, collecting materials from Test 
Administrators, returning secure test materials, and securely destroying certain specified materials after testing. 


The administration of the summative assessment includes both secure and nonsecure materials, and these materials 
are further delineated by whether they are “scorable” or “nonscorable,” depending on whether the assessments 
were administered via paper/pencil (i.e., paper-based assessments) or online (i.e., computer-based assessments). 
For the paper-based administration, students used paper-based answer documents (except in grade 3 where 
students responded directly into test booklets). Above 97 percent of the summative assessments administered 
during the 2018-2019 administration were online assessments, and less than 3 percent were paper-based 
assessments (see Tables 11.1 — 11.3). 


3.1.1 Secure vs. NonSecure Materials 


Participating states and agencies define secure materials as those that must be closely monitored and tracked to 
prevent unauthorized access to or prohibited use or distribution of secure content such as test items, reading 
passages, student work, etc. For paper-based tests, secure materials include both used and unused test booklets and 
used scratch paper, while for computer-based tests, secure materials include student testing tickets, secure 
administration scripts (e.g., mathematics read-aloud), and used scratch paper. Nonsecure materials are defined as 
any authorized testing materials that do not include secure content (e.g., test items or student work). These include 
test administration manuals, unused scratch paper, and mathematics reference sheets that have not been written 
upon, etc. 


3.1.2 Scorable vs. Nonscorable Materials 


Paper-based assessments have both scorable and nonscorable materials while computer-based assessments have 
only nonscorable materials. Scorable materials for paper-based assessments consist of used (includes student work) 
test booklets (grade 3) and answer documents (grades 4 and above) only. Scorable materials must be returned to 
the vendor to be scored. All other materials for paper-based testing, such as blank (i.e., unused) test booklets, test 
administration manuals, scratch paper, mathematics reference sheets, etc., are deemed nonscorable. For computer- 
based tests, there are no scorable materials as student work is submitted electronically for scoring. Thus, there are 
limited physical materials to return (e.g., secure administration scripts for certain accommodations). 


Students taking the computer-based test may not have access to secure test materials before testing, including 
printed student testing tickets. Printed mathematics reference sheets (if applicable) and scratch paper must be new 
and unmarked. 
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Students taking the paper-based test may not have access to scorable or nonscorable secure test content before or 
after testing. Scorable secure materials that are to be provided by Test Administrators to students include test 
booklets (grade 3) or answer documents (grades 4 through high school). Nonscorable secure materials that are 
distributed by Test Administrators to paper-based testing students include large print test booklets, braille test 
booklets, scratch paper (paper used by students to take notes and work through items), and printed mathematics 
reference sheets (grades 5 through 8 and high school). 


School Test Coordinators are required to maintain a tracking log to account for collection and destruction of test 
materials, including mathematics reference sheets and scratch paper written on by students. As part of the test 
administration policy, schools are required to maintain the Chain-of-Custody Form or tracking log of secure materials 
for at least three years unless otherwise directed by state policy. Copies of the Chain-of-Custody Form for paper- 
based testing are included in each Local Education Agency (LEA) or school’s test materials shipment. 


Test Administrators are not to have extended access to test materials before or after administration (except for 
certain accessibility or accommodations purposes). Test Administrators must document the receipt and return of all 
secure test materials (used and unused) to the School Test Coordinator immediately after testing. 


All test security and administration policies are found in the Test Coordinator Manual and the Test Administrator 
Manuals. State-specific policies are included in Appendix C of the Test Coordinator Manual. 


3.2 Accessibility Features and Accommodations 


3.2.1 Participation Guidelines for Assessments 


All students, including students with disabilities and English learners, are required to participate in statewide 
assessments and have their assessment results be part of the state’s accountability systems, with narrow exceptions 
for English learners in their first year in a U.S. school, and certain students with disabilities who have been identified 
by the Individualized Education Program (IEP) team to take their state’s alternate assessment. All eligible students 
will participate in the ELA/L and mathematics assessments. Federal laws governing student participation in 
statewide assessments include the No Child Left Behind Act of 2001 (NCLB), the Individuals with Disabilities 
Education Act of 2004 (IDEA), Section 504 of the Rehabilitation Act of 1973 (reauthorized in 2008), and the 
Elementary and Secondary Education Act (ESEA) of 1965, as amended. All students can receive accessibility features 
on the summative assessments. 


Four distinct groups of students may receive accommodations on the summative assessments: 


1. students with disabilities who have an Individualized Education Program (IEP); 


2. students with a Section 504 plan who have a physical or mental impairment that substantially limits one or 
more major life activities, have a record of such an impairment, or are regarded as having such an 
impairment, but who do not qualify for special education services; 


3. students who are English learners; and 


4. students who are English learners with disabilities who have an IEP or 504 plan. 


These students are eligible for accommodations intended for both students with disabilities and English learners. 
Testing accommodations for students with disabilities or students who are English learners must be documented 
according to the guidelines and requirements outlined in the Accessibility Features and Accommodations Manual. 
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3.2.2 Accessibility System 


Through a combination of universal design principles and accessibility features, participating states and agencies 
designed an inclusive assessment system by considering accessibility from initial design through item development, 
field testing, and implementation of the assessments for all students, including students with disabilities, English 
learners, and English learners with disabilities. Accommodations may still be needed for some students with 
disabilities and English learners to assist in demonstrating what they know and can do. However, the accessibility 
features available to students should minimize the need for accommodations during testing and ensure the 
inclusive, accessible, and fair testing of the diverse students being assessed. 


3.2.3 What are Accessibility Features? 


On the computer-based assessments, accessibility features are tools or preferences that are either built into the 
assessment system or provided externally by Test Administrators, and may be used by any student taking the 
summative assessments (i.e., students with and without disabilities, gifted students, English learners, and English 
learners with disabilities). Since accessibility features are intended for all students, they are not classified as 
accommodations. Students should have the opportunity to select and practice using them prior to testing to 
determine which are appropriate for use on the assessment. Consideration should be given to the supports a 
student finds helpful and consistently uses during instruction. Practice tests that include accessibility features are 
available for teacher and student use throughout the year. 


3.2.4 Accommodations for Students with Disabilities and English Learners 


It is important to ensure that performance in the classroom and on assessments is influenced minimally, if at all, by a 
student’s disability or linguistic/cultural characteristics that may be unrelated to the content being assessed. For the 
summative assessments, accommodations are considered to be adjustments to the testing conditions, test format, 
or test administration that provide equitable access during assessments for students with disabilities and students 
who are English learners. In general, the administration of the assessment should not be the first occasion on which 
an accommodation is introduced to the student. To the extent possible, accommodations should: 


e provide equitable access during instruction and assessments; 
e mitigate the effects of a student’s disability; 
e not reduce learning or performance expectations; 


e not change the construct being assessed; and 


not compromise the integrity or validity of the assessment. 


Accommodations are intended to reduce and/or eliminate the effects of a student’s disability and/or English 
language proficiency level; however, accommodations should never reduce learning expectations by reducing the 
scope, complexity, or rigor of an assessment. Moreover, accommodations provided to a student on the summative 
assessments must be generally consistent with those provided for classroom instruction and classroom assessments. 
There are some accommodations that may be used for instruction and for formative assessments that are not 
allowed for the summative assessment because they impact the validity of the assessment results—for example, 
allowing a student to use a thesaurus or access the Internet during an assessment. There may be consequences (e.g., 
excluding a student’s test score) for the use of non-allowable accommodations during assessments. It is important 
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for educators to become familiar with the participating state and agencies’ policies regarding accommodations used 


for assessments. 


To the extent possible, accommodations should adhere to the following principles. 


Accommodations enable students to participate more fully and fairly in instruction and assessments and to 
demonstrate their knowledge and skills. 


Accommodations should be based upon an individual student’s needs rather than on the category of a 
student’s disability, level of English language proficiency alone, level of or access to grade-level instruction, 
amount of time spent in a general classroom, current program setting, or availability of staff. 


Accommodations should be based on a documented need in the instruction/assessment setting and should 
not be provided for the purpose of giving the student an enhancement that could be viewed as an unfair 
advantage. 


Accommodations for students with disabilities must be described and documented in the student’s 
appropriate plan (i.e., either a 504 plan or an approved IEP), and must be provided if they are listed. 


Accommodations for English learners should be described and documented. 


Students who are English learners with disabilities are eligible to receive accommodations for both students 
with disabilities and English learners. 


Accommodations should become part of the student’s program of daily instruction as soon as possible after 
completion and approval of the appropriate plan. 


Accommodations should not be introduced for the first time during the testing of a student. 
Accommodations should be monitored for effectiveness. 


Accommodations used for instruction should also be used, if allowable, on local district assessments and 
state assessments. 


In the following scenarios, the school must follow each state’s policies and procedures for notifying the state 


assessment office: 


a student was provided a test accommodation that was not listed in his or her IEP/504 
plan/documentation for an English learner, or 


a student was not provided a test accommodation that was listed in his or her IEP/504 
plan/documentation for an English learner. 


3.2.5 Unique Accommodations 


A comprehensive list of accessibility features and accommodations was provided in the Accessibility Features and 


Accommodations Manual that are designed to increase access to the summative assessments and that will result in 


valid, comparable assessment scores. However, students with disabilities or English learners may require additional 


accommodations that are not already listed. Participating states and agencies individually review requests for unique 


accommodations in their respective states and provide a determination as to whether the accommodation would 


result in a valid score for the student, and if so, would approve the request. 
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3.2.6 Emergency Accommodations 


An emergency accommodation may be appropriate for a student who incurs a temporary disabling condition that 
interferes with test performance shortly before or during the assessment window. A student, whether or not they 
already have an IEP or 504 plan, may require an accommodation as a result of a recently occurring accident or 
illness. Cases include a student who has a recently fractured limb (e.g., arm, wrist, or shoulder); a student whose 
only pair of eyeglasses has broken; or a student returning to school after a serious or prolonged illness or injury. An 
emergency accommodation should be given only if the accommodation will result in a valid score for the student 
(i.e., does not change the construct being measured by the test[s]). If the principal (or designee) determines that a 
student requires an emergency accommodation on the summative assessment, an Emergency Accommodation 
Form must be completed and maintained in the student’s assessment file. If required by a state, the school may 
need to consult with the state or district assessment office for approval. The parent must be notified that an 
emergency accommodation was provided. If appropriate, the Emergency Accommodation Form may also be 
submitted to the District Assessment Coordinator to be retained in the student’s central office file. Requests for 
emergency accommodations will be approved after it is determined that use of the accommodation would result in 
a valid score for the student. 


3.2.7 Student Refusal Form 


If a student refuses an accommodation listed in his or her IEP, 504 plan, or (if required by the member state) an 
English learner plan, the school should document in writing that the student refused the accommodation, and the 
accommodation must be offered and remain available to the student during testing. This form must be completed 
and placed in the student’s file and a copy must be sent to the parent on the day of refusal. Principals (or designee) 
should work with Test Administrators to determine who, if any others, should be informed when a student refuses 
an accommodation documented in an IEP, 504 plan, or (if required by the member state) English learner plan. 


3.3 Testing Irregularities and Security Breaches 


Any action that compromises test security or score validity is prohibited. These may be classified as testing 
irregularities or security breaches. Below are examples of activities that compromise test security or score validity 
(note that these lists are not exhaustive). It is highly recommended that School Test Coordinators discuss other 
possible testing irregularities and security breaches with Test Administrators during training. 


Examples of test security breaches and irregularities include but are not limited to: 


Electronic Devices 


e Using a cell phone or other prohibited handheld electronic device (e.g., smartphone, iPod, smart watch, 
personal scanner) while secure test materials are still distributed, while students are testing, after a student 
turns in his or her test materials, or during a break 


e Exception: Test Coordinators, Technology Coordinators, Test Administrators, and Proctors are permitted to 
use cell phones in the testing environment only in cases of emergencies or when timely administration 
assistance is needed. LEAs may set additional restrictions on allowable devices as needed. 


Test Supervision 


e Coaching students during testing, including giving students verbal or nonverbal cues, hints, suggestions, or 
paraphrasing or defining any part of the test 
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Engaging in activities (e.g., grading papers, reading a book, newspaper, or magazine) that prevent proper 
student supervision at all times while secure test materials are still distributed or while students are testing 


Leaving students unattended for any period of time while secure test materials are still distributed or while 
students are testing 


Deviating from testing time procedures 

Allowing cheating of any kind 

Providing unauthorized persons with access to secure materials 
Unlocking a test in PearsonAccess"™ during non-testing times 


Failing to provide a student with a documented accommodation or providing a student with an 
accommodation that is not documented and therefore is not appropriate 


Allowing students to test before or after the state’s test administration window 


Test Materials 


Losing a student test booklet or answer document 

Losing a student testing ticket 

Leaving test materials unattended or failing to keep test materials secure at all times 
Reading or viewing the passages or test items before, during, or after testing 


Exception: Administration of a human reader/signer accessibility feature for mathematics or 
accommodation for English language arts/literacy, which requires a Test Administrator to access passages 
or test items 


Copying or reproducing (e.g., taking a picture of) any part of the passages or test items or any secure test 
materials or online test forms 


Revealing or discussing passages or test items with anyone, including students and school staff, through 
verbal exchange, email, social media, or any other form of communication 


Removing secure test materials from the school’s campus or removing them from locked storage for any 
purpose other than administering the test 


Testing Environment 


Allowing unauthorized visitors in the testing environment 
Failing to follow administration directions exactly as specified in the Test Administrator Manual 


Displaying testing aids in the testing environment (e.g., a bulletin board containing relevant instructional 
materials) during testing 


Allinstances of security breaches and testing irregularities must be reported to the School Test Coordinator 


immediately. The Form to Report a Testing Irregularity or Security Breach must be completed within two school days 


of the incident. 


If any situation occurred that could cause any part of the test administration to be compromised, schools should 


refer to the Test Coordinator Manual for each state’s policy and immediately follow those steps. Instructions for the 


School Test Coordinator or LEA Test Coordinator to report a testing irregularity or security breach is available in the 


Test Coordinator Manual. 
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3.4 Data Forensics Analyses 


Maintaining the validity of test scores is essential in any high-stakes assessment program, and misconduct 
represents a serious threat to test score validity. When used appropriately, data forensic analyses can serve as an 
integral component of a wider test security protocol. The results of these data forensic analyses may be 
instrumental in identifying potential cases of misconduct for further follow-up and investigation. 


The following data forensics analyses were conducted on the operational assessments: 


e@ Response Change Analysis 


e Aberrant Response Analysis 
e@ Plagiarism Analysis 
@ Longitudinal Performance Modeling 


e = Internet and Social Media Monitoring 


e §Off-Hours Testing Monitoring 


An overview of each data forensics analysis method is provided next. 


3.4.1 Response Change Analysis 


Response change analysis looks at how often student answers are changed, focusing specifically on an excessive 
number of wrong answers changed to right answers. In traditional paper-based, multiple-choice testing programs, 
this is sometimes referred to as “erasure analysis.”? The rationale for erasure analysis is that a teacher or 
administrator who is intent on improving classroom performance might be motivated to change student responses 
after the answer sheets are collected. A clustered number of student answer documents from the same school or 
classroom with unusually high numbers of answers changed from wrong to right might provide evidence to support 
follow-up investigation. The response change analysis extended the traditional erasure method to account for issues 
specific to computer-based testing as well as the variety of item types on the summative assessments, such as 
partial-credit, multi-part, and multiple-select items. 


3.4.2 Aberrant Response Analysis 


Aberrant response pattern detection analysis looks at the unusualness of student responses compared with what 
would be expected. Most simply, this can be thought of as quantifying the extent to which higher-scoring students 
miss easy questions and lower-scoring students answer difficult questions correctly. While it would be difficult to 
draw a definitive inference about a single student flagged as having an aberrant response pattern, a cluster of 
students with aberrant response patterns within a classroom or school might warrant further investigation. 


1The term “erasure analysis” is sometimes objected to because it is inferential rather than descriptive. A more descriptive term is 
“mark discrimination analysis,” which recognizes that the scanning approach makes discriminations among the darkness of 
selected answer choices when multiple responses to a multiple-choice item are detected during answer sheet processing. 
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3.4.3 Plagiarism Analysis 


Plagiarism analysis compares the responses given for a group of written composition items, looking for high degrees 
of similarity. For the summative assessments, the primary item type of interest was the prose constructed-response 
(PCR) tasks in the English language arts/literacy (ELA/L) content area. This analysis was conducted for PCR tasks 
administered online using some of the same artificial intelligence (Al) techniques that are applied in automated 
essay scoring. Specifically, this method was based on Latent Semantic Analysis (LSA) technology to detect possible 
plagiarism. Using LSA, the content of each constructed response was compared against the content of every other 
constructed response and a measure that indicated the degrees of similarity was generated for each pair of 
response comparison. Because LSA provided a semantic representation of language, rather than a syntactic or word- 
based representation, it allowed the detection of potential copying behaviors, even when students or administrators 
substituted synonymous words or phrases. 


3.4.4 Longitudinal Performance Monitoring 


Longitudinal performance modeling evaluates the performance on the summative assessments across test 
administrations and identifies unusual performance gains in the unit of interest (e.g., school or district). A Weighted 
Least Squares (WLS) regression methodology was evaluated and recommended by the Technical Advisory 
Committee (TAC) for implementation starting spring 2017. The WLS identified unusual changes in test performance 
across two consecutive administrations of the assessment. In the WLS regression approach, mean current year scale 
scores are regressed on mean prior year scale scores, weighting by unit sample size. Standardized residuals are 
calculated by dividing raw residuals by their respective standard deviations. Units with a standardized residual 
exceeding 3.0 are flagged for unexpected performance. 


3.4.5 Internet and Social Media Monitoring 


Internet and social media monitoring were conducted by Caveon, LLC. Caveon’s team monitored English-language 
websites and searchable forums that were publicly available for suspected proxy testing solicitations and website 
postings that contain, or appear to contain, infringements of protected operational test content. The Internet and 
social media outlets monitored included popular websites (such as Facebook and Twitter), blogs, discussion forums, 
video archives, document archives, brain dumps, auction sites, media outlets, peer-to-peer servers, etc. Caveon’s 
process generated regular updates that categorize identified threats by level of actual or potential risk based upon 
the representations made on the websites, or actual analysis of the proffered content. For example, categorizations 
typically ranged from “cleared” (lowest risk but bookmarked for continued monitoring) to “severe” (highest risk). 
Note that this process only considered potential breaches of secure item content, not violations of testing 
administration policies. Potential breaches were reported directly to the state(s) implicated for further action. 
Summary reports describing the threats were provided through notification emails. 


3.4.6 Off-Hours Testing Monitoring 


Off-hours testing monitoring checks for suspicious testing activities at test administration locations occurring outside 
of the set windows for computer-based testing sessions. Participating states and agencies established set start and 
end times for administering computer-based assessments. Based on these hours, authorized users (that is, users 
with the State Role) were allowed to override the start and end times for a test session. The off-hours testing 
monitoring process tracked such occurrences and logged them in an operational report, which listed the sessions 
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within an organization that selected to test outside the set window. States could use this report to follow-up with 
the organizations identified in the report. 
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Section 4: Item Scoring 


4.1 Machine-Scored Items 


4.1.1 Key-Based Items 


Pearson performed a key review prior to the test administration to verify that the scoring (answer) keys were correct 
for each item. Once the forms were constructed and approved for publication, an independent key review was 
performed by an experienced third-party vendor. The vendor reviewed each item and confirmed that the key was 
correct. If discrepancies were identified, a Pearson senior content specialist or content manager reviewed the 
flagged item(s) and worked with the item developers to resolve the issue. 


4.1.2 Rule-Based Items 


Rule-based scoring refers to item types that use various scoring models. Participating states and agencies use 
Question and Test Interoperability (QTI) item type implementation based on scoring model rules. Examples of these 
item types include “choice interaction,” which presents a set of choices where one or more choices can be selected; 
text entry, where the response is entered in a text box; hot spot or text interaction, where an area in a graph or text 
in a paragraph (for example) can be highlighted; or match interaction, where an association can be made between 
pairs of choices in a set. These items include the scoring rules and correct responses as part of their item XML 
(markup language) coding. 


During the initial stages of item development, Pearson staff worked closely with participating states and agencies to 
first delineate the rules for the scoring rubrics and then to adjust those rules based on student responses. During 
item studies in spring 2015, Pearson content staff received input from the staff of participating states and agencies 
to develop a thorough rule-based scoring process that met their needs. 


Pearson worked with the item developers to review initial scoring rules created during the item development. Once 
the rule-based scoring process was approved, and prior to test construction, Pearson content staff worked closely 
with the item developers to finalize scoring rubrics for items to be scored via the rule-based scoring method. The 
proposed scoring rubrics were sent for review, and if any additional changes were needed or new rules added, 
Pearson documented and applied the requested edits. 


During test construction, Pearson monitored and evaluated the scoring and updated the scoring keys/ scoring rules 
in the item bank. After the tryout items were scored, Pearson prepared a frequency distribution of student 
responses for each item or task scored using a rule-based approach and compared this to the expected response 
based on correct answers to ensure that scoring keys and rules were appropriately applied. The content team 
analyzed the student response data to determine if scoring was acceptable using the item metadata and the student 
response file in conjunction with any potential item issues as flagged by psychometrics. These frequency 
distributions included an indication of right/wrong and other identifying information defined by participating states 
and agencies, and those items that showed a statistical anomaly, whereby the frequency distribution was outside of 
the expected range, were sent to content experts to verify that the items were coded with the correct key. 


Following the Rule-Based Scoring Educator Committee’s review, which occurred prior to year one test construction, 
Pearson analyzed the feedback from the committees and made recommendations about adjustments to the scoring 
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rubrics based on the results of the reviews. Upon submission of the results, Pearson worked with the staff of 
participating states and agencies to discuss these findings and determine next steps prior to the completion of 
scoring. In subsequent years as scoring inquiries arise throughout the process of test construction, forms creation, 
testing, scoring, and psychometric analysis, items with scoring discrepancies are brought before the Priority Alert 
Task Force for resolution. This committee consists of representatives from each state as well as the content 
specialists at participating states and agencies and Pearson. 


Following the initial development of the rule-based scoring rubrics, Pearson has continued to monitor and evaluate 
new item development to ensure the scoring rules established are maintained within all item types as approved. 


Pearson continues to use several avenues to monitor scoring each year. Prior to testing, a third-party key review 
checks operational and field test items for correct keys. Any disputed items go to a second review with Pearson 
content experts and anything still in question is taken before the task force for review and possible key change. 
During testing, Pearson creates early testing files for frequency distribution analysis whereby items for which an 
incorrect key receives a high distribution of responses are further evaluated for accuracy. After testing, all responses 
are again evaluated for the distribution of responses and potential scoring abnormalities during psychometric 
analysis. Any change in scoring that may be requested as a result of the psychometric analysis is also taken before 
the Priority Alert Task Force for decisions. These processes are the same for both paper and online modes of testing. 


4.2 Human or Handscored Items 


Constructed-response items were scored by human scorers in a process referred to as handscoring. Online training 
units were used to train all scorers. The online training units included prompts (items), passages, rubrics, training 
sets, and qualification sets. Scorers who successfully completed the training and qualified, demonstrating they could 
correctly score student responses based on the guidelines in the online training units, were permitted to score 
student responses using the ePEN2 (Electronic Performance Evaluation Network, second generation) scoring 
platform. All online and paper responses were scored within the ePEN2 system. Pearson monitored quality 
throughout scoring. 


Pearson staff roles and responsibilities were as follows: 


e Scorers applied scores to student responses. 


e Scoring supervisors monitored the work of a team of scorers through review of scorer statistics and 
backreading, which is a review of responses scored by each scorer. When backreading, a supervisor sees the 
scores applied by scorers, which helps the supervisor provide additional coaching or instruction to the 
scorer being backread. 


e Scoring directors managed the scoring quality of a subset of items and monitored the work of supervisors 
and scorers for their assigned items. Directors backread responses scored by supervisors and scorers as part 
of their quality-monitoring duties. 


e —_ English language arts/literacy (ELA/L) and mathematics content specialists managed the scoring quality and 
monitored the work of the scoring directors. 


e Project managers documented the procedures, identified risks, and managed day-to-day administrative 
matters. 


e A program manager provided oversight for the entire scoring process. 
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All Pearson employees involved in the scoring or the supervision of scoring possessed at least a four-year college 
degree. 


4.2.1 Scorer Training 


Key steps in the development of scorer training materials were rangefinding and rangefinder review meetings where 
educators and administrators from states met to interpret the scoring rubrics and determine consensus scores for 
student responses. Rangefinding meetings were held prior to scoring field-test items, and rangefinder review 
meetings were held prior to scoring operational items. 


At rangefinding meetings, educators and administrators from states reviewed student responses and used scoring 
rubrics to determine consensus scores. Those responses scored in rangefinding were used to create field-test scorer 
training sets. After items were selected for operational testing, educators and administrators attended rangefinder 
review meetings to review and approve proposed operational scorer training sets. 


When developing scorer training materials, Pearson scoring directors carefully reviewed detailed notes and records 
from rangefinding and rangefinder review committee meetings. Training sets were developed using the responses 
scored by the committees and additional suitable student response samples (as needed). All scorer training sets 
were reviewed and approved prior to scorer training. 


During training, scorers reviewed training sets of scored student responses with annotations that explained the 
rationale for the score assigned. The anchor set was the primary reference for scorers as they internalized the rubric 
during training. Each anchor set consisted of responses that were clear examples of student performance at each 
score point. The responses selected were representative of typical approaches to the task and arranged to reflect a 
continuum of performance. All scorers had access to the anchor set when they were training and scoring and were 
directed to refer to it regularly during scoring. 


Practice sets were used in training to help trainees practice applying the scoring guidelines. Scorers reviewed the 
anchor sets, scored the practice sets, and then were able to compare their assigned scores for the practice sets to 
the actual assigned scores to help them learn. 


Qualification sets were used to confirm that scorers understood how to score student responses accurately. 
Qualification sets were composed of responses that were clear examples of score points. Scorers were required to 
meet specified agreement percentages on qualification sets in order to score student responses. 


Pearson has developed two types of training sets to train scorers: prototype and abbreviated sets. Prototype training 
sets were complete training sets consisting of anchor, practice, and qualification sets (refer to 4.2.2 for information 
on the qualification process). In ELA/L, there was one prototype training set per task type (Research Simulation Task, 
Literary Analysis Task, and Narrative Writing Task) at each of the nine grade levels (grades 3 through 11). In 
mathematics, a prototype training set was built for a grouping of similar items for a total of approximately three to 
five prototype sets per grade level or course. 


The prototype training approach promoted consistency in scoring, as each subsequent abbreviated training set for 
the ELA/L task type or mathematics item grouping was based on the prototype. Once a prototype was chosen, full 
training materials were developed for that item, and at each grade level, scorers were trained to score a particular 
task type using the prototype training materials for that type. 
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Abbreviated training sets were prepared for all items not selected for prototype training sets. The abbreviated 
training sets included an anchor set and two practice sets so scorers could internalize the scoring standards for these 
new items, which were similar to prototype items they had previously scored. 


Anchor and practice sets for both prototype and abbreviated items included annotations for each response. 
Annotations are formal written explanations of the score for each student response. 


Table 4.1 details the composition of the anchor sets, practice sets, and qualification sets. 


Table 4.1 Training Materials Used During Scoring 
Training Set Development 


Description 


Specification 


Anchor Set 


The anchor set is the primary reference for 
scorers as they internalize the rubric during 
training. All scorers have access to the anchor 
set when they are training and scoring, and are 
directed to refer to it regularly. 


The anchor set comprises clear examples of 
student performance at each score point. The 
responses selected may be representative of 
typical approaches to the task or arranged to 
reflect a continuum of performance. 


Practice Sets 


The anchor set for mathematics prototype items comprises 
three annotated responses per score point. 


The anchor set for subsequent abbreviated items for 
mathematics comprise one to three annotated responses per 
score point. 


The anchor sets for ELA/L prototype items comprise three 
annotated responses per score point. Anchor sets for 
prototype items include separate complete anchor sets for 
each applicable scoring trait (Reading Comprehension and 
Written Expression and Conventions [RCWE] for Research 
Simulation and Literary Analysis Tasks, Written Expression 
[WE] for Narrative Writing Tasks, and Knowledge of Language 
and Conventions for all task types). 


Practice sets are used to help trainees develop 
experience in independently applying the 
scoring guide (the rubric) to student responses. 
Some of these responses clearly reinforce the 
scoring guidelines presented in the anchor set. 
Other responses are selected because they are 


more difficult to evaluate, fall near the boundary 


between two score categories, or represent 
unusual approaches to the task. 


The practice sets provide guidance and practice 
for trainees in defining the line between score 
categories, as well as applying the scoring 
criteria to a wider range of types of responses. 


The practice sets for mathematics prototype and abbreviated 
items include two to three sets of ten annotated responses. 


ELA/L practice sets for prototype items include two sets of 
five annotated responses and two sets of ten annotated 
responses. 


The subsequent ELA/L practice sets for abbreviated items 
include two sets of ten annotated responses. 


New Meridian 


February 28, 2020 Page 32 


2019 Technical Report 


Qualification Sets 

Qualification sets are used to confirm that scorer The qualification sets for mathematics prototype items 
trainees understand the scoring criteria and are include three sets of ten responses each (not annotated). 
able to assign scores to student responses 

accurately. The responses in these sets are The subsequent mathematics abbreviated items for 
selected to reinforce the application of the mathematics do not include qualification sets. 

scoring criteria illustrated in the anchor set. 


Scorer trainees must demonstrate acceptable 


performance on these sets by meeting a pre- The qualification sets for ELA/L prototype items include three 
determined standard for accuracy in order to sets of ten responses each (not annotated). 

qualify to score. Pearson scoring staff define and 

document qualifying standards in conjunction The subsequent ELA/L abbreviated items do not include 

with participating states and agencies prior to qualification sets. 

scoring. 


4.2.2 Scorer Qualification 


In order to score items, scorers were required to show that they were able to apply scoring methodology accurately 
through a qualification process. Scorers were asked to apply scores to three qualification sets consisting of ten 
responses each. ELA/L scorers applied a score for each trait on each response in the qualification sets. Literary 
Analysis and Research Simulation Tasks each had two traits: the Reading Comprehension and Written Expression 
trait and the Conventions trait. The Narrative Writing Task had two traits: Written Expression and Conventions. 
Mathematics scorers applied a score for each part of an item that was a constructed response. The number of 
constructed-response parts for each mathematics item ranged from one to four. Scorers were required to match the 
approved score at a percentage agreed to by participating states and agencies in order to qualify. 


For ELA/L qualification, scorers were required to meet the following three conditions: 


1. On atleast one of the three qualifying sets, at least 70 percent of the ratings on each of the two scoring 
traits (considered separately) must agree exactly with the approved scores. 


2. Onat least two of the three qualifying sets, at least 70 percent of the ratings (combined across the three 
scoring traits) must agree exactly with the approved scores. 


3. Combining over the three qualifying sets and across the two scoring traits, at least 96 percent of the ratings 


must be within one point of the approved scores. 


For mathematics qualification, the requirements were based on the item types and score point ranges. Because 
mathematics items can have one or more scoring traits, a scorer needed to achieve the following requirements 
separately for each scoring trait (when applicable to the item): 
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Table 4.2 Mathematics Qualification Requirements 


Category Score Point Range Perfect Agreement Within One Point 
2 0-1 90% 100% 
3 0-2 80% 96% 
4 0-3 70% 96% 
5 0-4 70% 95% 
6 0-5 70% 95% 
7 0-6 70% 95% 


On at least two of the three qualifying sets, a scorer was required to meet the “perfect agreement” percentage 
indicated in the table above for each category. “Perfect agreement” was achieved when the scores applied exactly 
matched the approved scores. Over the three qualifying sets, a scorer was required to meet the “within one point” 
percentage indicated in the table above for each category. The average is exclusive to each trait, so an item with 
multiple scoring traits would have multiple trait rating averages within one point of the approved score. 


4.2.3 Managing Scoring 


Pearson created a handscoring specifications document that detailed the handscoring schedule, customer 
requirements, rangefinding plans, quality management plans, item information, and staffing plans for each scoring 
administration. 


4.2.4 Monitoring Scoring 


Second Scoring 
During scoring, Pearson’s ePEN2 scoring system automatically and randomly distributed a minimum of 10 percent of 


student responses for second scoring; scorers had no indication whether a response had been scored previously. 
Humans applied the second score for all mathematics items. Second scoring for ELA/L was performed either by 
human scorers or by the Intelligent Essay Assessor. If the first and second scores applied were nonadjacent, a third 
and occasionally a fourth score was assigned to resolve scorer disagreements. When a resolution score (i.e., third 
score) was nonadjacent to one or both of the first and second scores, the content specialist or scoring director would 
apply an adjudication score (fourth score). 
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Table 4.3 Scoring Hierarchy Rules 
If a response was scored more than once, the following rules were applied to determine the final score: 


Score Type Rank Final Score Calculation 

Adjudication 1 If an adjudication score is assigned, this is the final score. 

Resolution 2 If no adjudication score is assigned, this is the final score. 

Backread 3 If no adjudication or resolution score is assigned, the latest backreading 
score is the final score. 

Human First Score 4 If no adjudication, resolution, or backreading score is assigned, this is 
the final score. 

Human Second Score 5 If no adjudication, resolution, backreading, or human first score is 
assigned, this is the final score. 

Intelligent Essay Assessor 6 If no human score is assigned, this is the final score. 

Score 

Backreading 


Backreading was one of the major responsibilities of Pearson Scoring Supervisors and a primary tool for proactively 
guarding against scorer drift, where scorers score responses in comparison to one another instead of in comparison 
to the training responses. Scoring supervisory staff used the ePEN2 backreading tool to review scores assigned to 
individual student responses by any given scorer in order to confirm that the scores were correctly assigned and to 
give feedback and remediation to individual scorers. Pearson backread approximately 5 percent of the handscored 
responses. Backreading scores did not override the original score but were used to monitor scorer performance. 


Validity 

Validity responses are pre-scored responses strategically interspersed in the pool of live responses. These responses 
were not distinguishable from any other responses so that scorers were not aware they were scoring validity 
responses rather than live responses. The use of validity responses provided an objective measure that helped 
ensure that scorers were applying the same standards throughout the project. In addition, validity was at times 
shared with scorers in a process known as “validity as review.” Validity as review provided scorers automated, 
immediate feedback: a chance to review responses they mis-scored, with reference to the correct score and a brief 
explanation of that score. One validity response was sent to scorers for every 25 “live” responses scored. 


Validity agreement requirements for scorers are listed in Table 4.4. Scorers had to meet the required validity 
agreement percentages to continue working on the project. Scorers who did not maintain expected agreement 
statistics were given a series of interventions culminating in a targeted calibration set: a test of scorer knowledge. 
Scorers who did not pass targeted calibration were removed from scoring the item, and all the scores they assigned 
were deleted. 
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Table 4.4 Scoring Validity Agreement Requirements 


Subject ali Perfect Agreement Within One Point* 
Range 
Mathematics 0-1 90% 96% 
Mathematics 0-2 80% 96% 
Mathematics 0-3 70% 96% 
Mathematics 0-4 65% 95% 
Mathematics 0-5 65% 95% 
Mathematics 0-6 65% 95% 
ELA/L Multi-trait 65% 96% 


*A zero or 1 score compared to a blank score will have a disagreement greater than 1 point. 


Calibration Sets 
Calibration sets are special sets created during scoring to help train scorers on particular areas of concern or focus. 


Scoring directors used calibration sets to reinforce rangefinding standards, introduce scoring decisions, or address 
scoring issues and trends. Calibration was used either to correct a scoring issue or trend, or to continue scorer 
training by introducing a scoring decision. Calibration was administered regularly throughout scoring. 


Inter-rater Agreement 
Inter-rater agreement is the agreement between the first and second scores assigned to student responses and is 


the measure of how often scorers agree with each other. Pearson scoring staff used inter-rater agreement statistics 
as one factor in determining the needs for continuing training and intervention on both individual and group levels. 
Inter-rater agreement expectations are shown in Table 4.5. 


Table 4.5 Inter-rater Agreement Expectations and Results 


Within 
. Perfect ae . 
. Score Point Perfect Agreement Within One Point One 
Subject . Agreement ; : 
Range Expectation Expectation* Point 
Result 
Result 
Mathematics 0-1 90% 98% 100% 100% 
Mathematics 0-2 80% 97% 100% 100% 
Mathematics 0-3 70% 95% 100% 99% 
Mathematics 0-4 65% 94% 99% 99% 
Mathematics 0-5 65% 93% 99% 98% 
Mathematics 0-6 65% 95% 99% 98% 
ELA/L Multi-trait 65% 80% 100% 99% 


*A zero or 1 score compared to a blank score will have a disagreement greater than 1 point. 


Pearson’s ePEN2 scoring system included comprehensive inter-rater agreement reports that allowed supervisory 
personnel to monitor both individual and group performance. Based on reviews of these reports, scoring experts 
targeted individuals for increased backreading and feedback, and if necessary, retraining. 


The perfect agreement rate for mathematics responses scored by two scorers ranged from 93 to 98 percent and the 
within one point rate ranged from 98 to 100 percent. For all ELA/L responses scored by two scorers, the perfect 
agreement rate was 80 percent and the within one point rate was 99 percent. 
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The results by grade level for ELA/L are provided in Section 4.3.7: Inter-rater Agreement for Prose Constructed 
Response. 


4.3 Automated Scoring for PCRs 


Automated scoring performed by Pearson’s Intelligent Essay Assessor (IEA) was the default option for scoring the 
summative assessment’s online prose constructed-response (PCR) tasks. Under the default option, it was assumed 
that operational scores for approximately 90 percent of the online PCR responses would be assigned by IEA for the 
spring administration. The operational scores for the remaining online responses were assigned by human scorers. 
Human scoring was applied to responses that were scored while IEA was being trained as well as to additional 
responses routed to human scoring when there was uncertainty about the automated scores. 


For 10 percent of responses, a second “reliability” score was assigned. The purpose of the reliability score was to 
provide data for evaluating the consistency of scoring, which is done by evaluating scoring agreement. When IEA 
provided the first score of record, the second reliability score was a human score. 


4.3.1 Concepts Related to Automated Scoring 


The text below describes concepts related to automated scoring. 


Continuous Flow 
Continuous flow scoring results in an integrated connection between human scoring and automated scoring. It 


refers to a system of scoring where either an automated score, a human score, or both can be assigned based ona 
predetermined asynchronous operational flow. 


Training of IEA using Operational Data 
Continuous flow scoring facilitates the training of IEA using human scores assigned to operational online data 


collected early in the administration. Once IEA obtains sufficient data to train, it can be “turned on” and becomes 
the primary source of scoring (although human scoring continues for the 10 percent reliability sample and other 
responses that may be routed accordingly). 


Smart Routing 
Smart routing refers to the practice of using automated scoring results to detect responses that are likely to be 


challenging to score, and applying automated routing rules to obtain one or more additional human scores. Smart 
routing can be applied prompt by prompt to the extent needed to meet scoring quality criteria for automated 
scoring. 


Quality Criteria for Evaluating Automated Scoring 
The state leads approved specific quality criteria for evaluating automated scoring at the time IEA was trained. The 


primary evaluation criteria for IEA was based on responses to validity papers with “known” scores assigned by 
experts. For each prompt scored, a set of validity papers is used to monitor the human-scoring process over time. 
Validity papers are seeded into human scoring throughout the administration. The expectation is that IEA can score 
validity papers at least as accurately as humans can. 


Additional measures of inter-rater agreement for evaluating automated scoring were proposed based on the 
research literature (Williamson et al., 2012). These measures were previously utilized in Pearson’s automated 
scoring research and include Pearson correlation, kappa, quadratic-weighted kappa, exact agreement, and 
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standardized mean difference. These measures are computed between pairs of human scores, as well as between 


IEA and humans, to evaluate how performance was the same or different. Criteria for evaluating the training of IEA 


given these measures include the following: 


Pearson correlation between IEA-human should be within 0.1 of human-human. 

Kappa between IEA-human should be within 0.1 of human-human. 
Quadratic-weighted kappa between IEA-human should be within 0.1 of human-human. 
Exact agreement between IEA-human should be within 5.25 percent of human-human. 


Standardized mean difference between IEA-human should be less than 0.15. 


The specific criteria for evaluating IEA included both primary and secondary criteria and are noted below. 


Primary Criteria—Based on responses to validity papers: With smart routing applied as needed, IEA 
agreement is as good as or better than human agreement for each trait score. 


Contingent Primary Criteria— Based on the training responses if validity responses are not available: With 
smart routing applied as needed, IEA-human exact agreement is within 5.25 percent of human-human exact 
agreement for each trait score. 


Secondary Criteria—Based on the training responses: With smart routing applied as needed, IEA-human 
differences on statistical measures for each trait score are within the Williamson et al. tolerances for 
subgroups with at least 50 responses. 


Hierarchy of Assigned Scores for Reporting 
When multiple scores are assigned for a given response, the following hierarchy determines which score was 


reported operationally: 


The IEA score is reported if it is the only score assigned. 

If an IEA score and a human score are assigned, the human score is reported. 

If two human scores are assigned, the first human score is reported. 

If a backread score and human and/or IEA scores are assigned, the backread score is reported. 


If a resolution score is assigned and an adjudicated score is not assigned, the resolution score is reported 
(note that if nonadjacent scores are encountered, responses are automatically routed to resolution). 


If an adjudicated score is assigned, it is reported (note that if a resolution score is nonadjacent to the other 
scores assigned, responses are automatically routed to adjudication). 


4.3.2 Sampling Responses Used for Training IEA 


For prompts trained using 2019 operational data, the early performance of human scoring was closely monitored to 


verify that an appropriate set of data would be available for training IEA. In particular, several characteristics of the 


human scoring data were monitored, including: 


exact agreement between human scorers (the goal was for this to be at least 65 percent for each trait); 


exact agreement between human scores conditioned on score point (the goal was for this to be at least 50 
percent for each trait); 


the number of responses at each score point (the goal was to have at least 40 responses at the highest 
score points in the training samples used by IEA); and 
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e the number of responses with two human scores assigned (note that IEA “ordered” additional scoring of 
responses during the sampling period as needed). 


Although the desired characteristics of the training data were easily achieved for some prompts, they were more 
challenging to achieve for others. For some prompts, a subset of scores were reset and clarifying directions were 
provided to scorers to improve human-human agreement. For other prompts, special sampling approaches were 
used to increase the numbers of responses that received top scores. In addition, a healthy percentage of responses 
were backread during the sampling period and these scores as well as double human scores were all part of the data 
used to train IEA. 


4.3.3 Primary Criteria for Evaluating IEA Performance 


The primary criteria for evaluating IEA performance is based on evaluating validity papers and is stated as follows: 
With smart routing applied as needed, IEA agreement is as good as or better than human agreement for each trait 


score. 


To operationalize the primary criteria for a given prompt, the following general steps are undertaken: 


1. Determine agreement of the human scores with the validity papers for each trait. 

2. Calculate agreement of the IEA scores with the validity papers for each trait. 

3. Compare the IEA validity agreement with the human agreement. 

4. If the IEA validity agreement is greater than or equal to the human agreement for each trait, IEA can be 


deployed operationally. 


In addition to looking at overall validity agreement, conditional agreement was also examined. In general, it was 
desirable for IEA to exceed 65 percent agreement at every score point as well as be close to or exceed the human 
validity agreement at each score point. 


4.3.4 Contingent Primary Criteria for Evaluating IEA Performance 


For many of the prompts trained in 2019, it was not possible to utilize human-scored validity responses in evaluating 
IEA performance. In these cases, IEA was evaluated based on IEA-human exact agreement for each trait score and 
compared to agreement based on responses that were double-scored by humans. A portion of the data was held out 
for evaluating IEA-human exact agreement according to the following steps: 


Determine exact agreement of the two human scores with each other for each trait. 
Calculate agreement of the IEA scores with the human scores for each trait. 


Compare the IEA-human agreement with the human-human agreement. 


Po MON oe 


If the IEA-human agreement is within 5.25 percent of the human-human agreement, IEA can be deployed 
operationally. 


In addition to the overall comparison, the following performance thresholds were targeted in the test data set: 1) at 
least 65 percent overall IEA-human agreement; and 2) 50 percent IEA-human agreement by score point (i.e., 
conditioned on the human score). These targets went beyond the contingent primary criteria approved by the state 
leads. 
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4.3.5 Applying Smart Routing 


With smart routing, the quality of automated scoring can be increased by routing responses that are more likely to 
disagree with a human score to receive an additional human score. 


When human scorers read a paper, they typically apply integer scores based on a scoring rubric. When there is 
strong agreement between two independent human readers, the readers might both assign a score of 3 such that 
the average score over both raters is also a 3 (i.e., (3+3)/2 = 3). IEA simulates this behavior, but because its scores 
come from an artificial intelligence algorithm, it generates continuous (i.e., decimalized) scores. In this case, the IEA 
score might be a 2.9 or 3.1. When human readers disagree on the score for a paper, say one reader gives the paper a 
score of 3 and another reader gives the paper a score of 4, the average of the two scores would be 3.5 (i.e., 
3+4=7/2=3.5). For this paper, IEA would likely provide a score between 3 and 4, say 3.4 or 3.6. Because this 
continuous score needs to be rounded to an integer score for reporting, it might be reported as a 3 or a 4, 
depending on the rounding rules. Smart routing involves routing those responses with “in between” IEA scores to 
additional human scoring because the nature of the responses suggests there may be less confidence in the IEA 
score. Since these “in between” IEA scores are based on modeling human scores, it follows that human scores may 
be less certain as well, and thus such responses tend to be the ones that it makes sense to have double-scored and 
possibly to resolve if the IEA and human scores are nonadjacent. 


Smart routing was utilized as needed to help IEA achieve targeted quality metrics (e.g., validity agreement or 
agreement with human scorers). Smart routing involved the application of the following four steps: 


1. The continuous IEA score for each of the two trait scores was rounded to the nearest score interval of 0.2, 
starting from zero. For example, IEA scores between 0 and 0.1 were rounded to an interval score of 0, 
scores between 0.1 and 0.3 were rounded to an interval score of 0.2, scores between 0.3 and 0.5 were 
rounded to an interval score of 0.4, and so on. 


2. Within each of these intervals, the percentage of exact agreement between IEA integer scores and the 
human scores was calculated for each trait. 


3. For each prompt, agreement rates were evaluated by rounding interval. Those intervals for which the 
agreement rates were below a designated threshold for either trait were identified. 


4. Once IEA scoring was implemented, responses within intervals for which IEA-human agreement was below 
the designated threshold were routed for additional human scoring. 


In training IEA, the scoring models without smart routing were evaluated first by applying either the primary validity 
criteria or the contingent criteria as described in Section 4.3. For those prompts that did not meet these criteria, 
increasing smart routing thresholds were applied in an iterative fashion to filter scores and evaluate the remaining 
scores against the criteria. That is, in any one iteration a particular smart routing threshold was applied such that 
only scores falling in intervals for which exact agreement exceeded the threshold were included in evaluating the 
criteria. If the primary or contingent criteria were not met with this level of smart routing, an increased smart 
routing threshold was applied iteratively until the primary or contingent criteria were met, or the maximum 
threshold reached. If the criteria were still not met after a maximum threshold was applied, different models were 
investigated and/or additional human scoring data utilized until an IEA scoring model was found that met the 
criteria. 
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4.3.6 Evaluation of Secondary Criteria for Evaluating IEA Performance 


The secondary criteria for evaluating IEA performance involved comparing agreement indices for IEA-human scoring 
for various demographic subgroups. Because of the importance of protecting personally identifiable information 
(PII), student demographic data is stored and managed separately from the performance scoring data. For this 
reason, it was not possible to evaluate subgroup performance in real time as IEA was being trained. 


For those prompts trained on early operational data, attempts were made to prioritize the data being returned from 
the field to include data from states or districts where more diverse populations of students were anticipated. In 
addition, requests for additional human scores were made to increase the likelihood that there would be sufficient 
numbers of responses with two human scores for most of the demographic subgroups of interest. 


Once IEA was trained and deployed, scoring sets used in training were matched to demographic information so that 
agreement between IEA and human scorers could be evaluated across subgroups. The analysis was conducted for 
the following ten comparison groups: 


Table 4.6 Comparison Groups 


Group Type Comparison Groups 

Sex Female 
Male 

Ethnicity American Indian/Alaska Native 
Asian 


Black/African American 
Hispanic/Latino 
Native Hawaiian or Other Pacific Islander 
White 

Special Instructional Needs English Language Learners (ELL) 
Students with Disabilities (SWD) 


IEA-human agreement indices were calculated for all cases with an IEA score and at least one human score. Human- 
human agreement was calculated for all cases with two human scores. 


To evaluate the training of IEA for subgroups, the following criteria approved by the state leads for subgroups with 
at least 50 IEA-human scores and at least 50 human-human scores were applied: 


e = Pearson correlation between IEA-human should be within 0.1 of human-human. 

e Kappa between IEA-human should be within 0.1 of human-human. 

e Quadratic-weighted kappa between IEA-human should be within 0.1 of human-human. 

e Exact agreement between IEA-human should be within 5.25 percent of human-human. 

e Standardized mean difference between IEA-human should be less than +0.15 (this criterion was applied to 


subgroups with at least 50 IEA-human scores). 


Although it was not expected that these criteria would be met for all subgroups for all prompts, if results of the 
evaluation between IEA and human scoring for subgroups for any prompt indicated that IEA performance 
persistently failed on the criteria listed above, consideration would be given to resetting the responses scored by IEA 
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and reverting to human scoring until such time that an alternate IEA model could be established with improved 
subgroup performance. 


In addition to the secondary criteria approved by the State Leads, the performance of IEA was compared to the 
following targets on the various measures for subgroups with at least 50 responses: 


e Pearson correlation between IEA-human should be 0.70 or above. 
e Kappa between IEA-human should be 0.40 or above. 
e § Quadratic-weighted kappa between IEA-human should be 0.70 or above. 


e Exact agreement between IEA-human should be 65 percent or above. 


These targets were not intended to be directly applied in decisions about whether to deploy IEA operationally or 
not. Such targets may or may not be met by human scoring for any particular prompt and/or subgroup, and if they 
are not met by human scoring, they are unlikely to be met by IEA scoring. Nevertheless, comparisons to these 
targets provided additional information about IEA performance (and human scoring) in an absolute sense. 


4.3.7 Inter-rater Agreement for Prose Constructed Response 


This section presents the inter-rater agreement for operational results for the online prose constructed-response 
(PCR) tasks by trait and grade level. PCR task items are scored on two traits: (1) Reading Comprehension and Written 
Expression and (2) Knowledge of Language and Conventions. 


For 10 percent of responses, a second “reliability” score was assigned. The purpose of the reliability score is to 
provide data for evaluating the consistency of scoring, which is done by evaluating scoring agreement. Inter-rater 
agreement is the agreement between the first and second scores assigned to student responses and is the measure 
of how often scorers agree with each other. Pearson scoring staff used inter-rater agreement indices as one factor in 
determining the needs for continuing training and intervention on both individual and group levels. Inter-rater 
agreement expectations are provided in Table 4.5 in Section 4.2.4. For ELA/L PCR traits, the expectation for 
agreement is an inter-rater agreement of 65 percent or higher between two scorers. When IEA provided the first 
score of record, the second reliability score was a human score. For those states choosing the human-scoring option, 
the second reliability score was assigned by IEA. For a subset of responses, the first and second score were both 
human scores. 


Table 4.7 presents the average agreement across the PCRs for each grade level by trait. The number of prompts 
included in the analyses is listed for each grade level. The agreement indices (exact agreement, kappa, quadratic- 
weighted kappa, and Pearson correlation) were calculated separately by PCR for each trait (Written Expression and 
Conventions). For each grade level, the agreement indices were averaged across the PCRs. Table 4.7 presents the 
average count and the average for the agreement indices. 


The exact agreement for the PCR traits is above the criteria of a 65 percent agreement rate criteria for all PCRs. The 
strength of agreement between raters is moderate to substantial agreement as defined by Landis and Koch (1977) 
for all PCRs. The quadratic-weighted kappa (QW Kappa) distinguishes between differences in ratings that are close to 
each other versus larger differences. The weighted kappa is substantial to almost perfect agreement for all grades. 
The Pearson correlations (r) ranged from .74 to .90. 
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During operational scoring, the PCR agreement rates are monitored for quality and items not meeting the criteria 


are shared with the handscoring group. After the operational administration, the performance of all the PCRs is 


provided to the content team as feedback for re-using PCRs and in order to inform development of future PCRs. This 


provides evidence for continuous improvement of the testing program. 


Table 4.7 PCR Average Agreement Indices by Test 


Written Expression 


Conventions 


Number QW Qw 

Test of PCRs Count Exact Kappa Kappa r Exact Kappa Kappa r 

ELAO3 5 38,626 71.88 0.54 0.74 0.74 73.56 0.58 0.77 0.77 
ELAO4 5 41,309 69.22 0.56 0.81 0.82 71.60 0.59 0.83 0.83 
ELAO5 5 77,241 70.74 0.58 0.84 0.84 71.02 0.59 0.83 0.83 
ELAO6 5 47,325 74.30 0.64 0.86 0.86 74.66 0.64 0.85 0.85 
ELAO7 5 61,267 73.36 0.63 0.88 0.88 74.52 0.65 0.87 0.87 
ELAO8 5 51,067 76.18 0.68 0.90 0.90 77.02 0.69 0.89 0.89 
ELAO9 5 15,051 71.70 0.61 0.87 0.87 71.50 0.60 0.84 0.84 
ELA10 5 24,432 73.20 0.64 0.90 0.90 76.80 0.68 0.89 0.89 
ELA11 5 5,991 76.56 0.66 0.85 0.86 77.98 0.67 0.86 0.86 
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Section 5: Classical Item Analysis 


5.1 Overview 


This section describes the results of the classical item analysis conducted for data obtained from the operational test 
items. All ELA/L and mathematics assessments were pre-equated. In addition, ELA/L assessments were post-equated 
for some states or agencies for score reporting (see Section 7). For pre-equated tests, the item statistics provided in 
this section were from prior operational administrations and reflect the statistics that were used at test construction 
and for score reporting for some states and agencies. For the post-equated tests, the statistics from the spring 
administration were also provided in this section. Item analysis serves two purposes: to inform item exclusion 
decisions for IRT analysis and to provide item statistics for the item bank. 


Item analysis included data from the following types of items: key-based selected-response items, rule-based 
machine-scored items, and handscored constructed-response items. For each item, the analysis produced item 
difficulty, item discrimination, and item response frequencies. 


5.2 Data Screening Criteria 


Item analyses were conducted by test form based on administration mode. In preparation for item analysis, student 
response files were processed to verify that the data were free of errors. Pearson Customer Data Quality (CDQ) staff 
ran predefined checks on all data files and verified that all fields and data needed to perform the statistical analyses 
were present and within expected ranges. 


Before beginning item analysis, Pearson performed the following data screening operations: 

All records with an invalid form number were excluded. 

All records that were flagged as “void” were excluded. 

All records where the student attempted fewer than 25 percent of items were excluded. 


For students with more than one valid record, the record with the higher raw score was chosen. 


SP @ Noe 


Records for students with administration issues or anomalies were excluded. 


5.3 Description of Classical Item Analysis Statistics 


A set of classical item statistics were computed for each operational item by form and by administration mode. Each 
statistic was designed to evaluate the performance of each item. 


The following statistics and associated flagging rules were used to identify items that were not performing as 
expected: 


Classical item difficulty indices (p-value and average item score) 
When constructing tests, a wide range of item difficulties is desired (i.e., from easy to hard items) so that students of 


all ability levels can be assessed with precision. At the operational stage, item difficulty statistics are used by test 
developers to build forms that meet desired test difficulty targets. 
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For dichotomously scored items, item difficulty is indicated by its p-value, which is the proportion of students who 
answered that item correctly. The range for p-values is from .00 to 1.00. Items with high p-values are easy items and 
those with low p-values are difficult items. Dichotomously scored items were flagged for review if the p-value was 
above .95 (i.e., too easy) or below .25 (i.e., too difficult). 


For polytomously scored items, difficulty is indicated by the average item score (AIS). The AIS can range from .00 to 
the maximum total possible points for an item. To facilitate interpretation, the AIS values for polytomously scored 
items are often expressed as percentages of the maximum possible score, which are equivalent to the p-values of 
dichotomously scored items. Polytomously scored items were flagged for review if the p-value was above .95 or 
below .25. 


The percentage of students choosing each response option 
Selected-response items on the summative assessments refer primarily to single-select multiple-choice scored items. 


These items require that the student select a response from a number of answer options. These statistics for single- 
select multiple-choice items indicate the percentage of students who select each of the answer options and the 
percentage that omit the item. The percentages are also computed for the high-performing subgroup of students 
who scored at the top 20 percent on the assessment. Items were flagged for review if more high-performing 
students chose the incorrect option than the correct response. Such a result could indicate that the item has 
multiple correct answers or is miskeyed. 


Item-total correlation 
This statistic describes the relationship between students’ performance on a specific item and their performance on 


the total test. The item-total correlation is usually referred to as the item discrimination index. For operational item 
analysis, the total score on the assessment was used as the total test score. The polyserial correlation was calculated 
for both selected-response items and constructed-response items as an estimate of the correlation between an 
observed continuous variable and an unobserved continuous variable hypothesized to underlie the variable with 
ordered categories (Olsson et al., 1982). Iltem-total correlations can range from -1.00 to 1.00. Desired values are 
positive and larger than .15. Negative item-total correlations indicate that low-ability students perform better on an 
item than high-ability students, an indication that the item may be potentially flawed. Item-total correlations 

below .15 were flagged for review. Items with extremely low or negative values were considered for exclusion from 
IRT calibrations or linking (refer to Section 7 for details on item inclusion and exclusion criteria for IRT analyses). 


Distractor-total correlation 
For selected-response items, this estimate describes the relationship between selecting an incorrect response (i.e., a 


distractor) for a specific item and performance on the total test. The item-total correlation is calculated (refer to #3 
analysis above) for the distractors. Items with distractor-total correlations above .00 were flagged for review as 
these items may have multiple correct answers, be miskeyed, or have other content issues. 


Percentage of students omitting or not reaching each item 
For both selected-response and constructed-response items, this statistic is useful for identifying problems with test 


features such as testing time and item/test layout. Typically, if students have an adequate amount of testing time, 
approximately 95 percent of students should attempt to answer each question on the test. A distinction is made 
between “omit” and “not reached” for items without responses. 


e Anitem is considered “omit” if the student responded to subsequent items. 


e  Anitem is considered “not reached” if the student did not respond to any subsequent items. 
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Patterns of high omit or not-reached rates for items located near the end of a test section may indicate that 
students did not have adequate time. Items with high omit rates were flagged. Omit rates for constructed-response 
items tend to be higher than for selected-response items. Therefore, the omit rate for flagging individual items was 5 
percent for selected-response items and 15 percent for constructed-response items. If a student omitted an item, 
then the student received a score of 0 for that item and was included in the n-count for that item. However, if an 
item was near the end of the test and classified as not reached, the student did not receive a score and was not 
included in the n-count for that item. 


Distribution of item scores 
For constructed-response items, examination of the distribution of scores is helpful to identify how well the item is 


functioning. If no students’ responses are assigned the highest possible score point, this may indicate that the item is 
not functioning as expected (e.g., the item could be confusing, poorly worded, or just unexpectedly difficult), the 
scoring rubric is flawed, and/or students did not have an opportunity to learn the content. In addition, if all or most 
students score at the extreme ends of the distribution (e.g., 0 and 2 for a 3-category item), this may indicate that 
there are problems with the item or the rubric so that students can receive either full credit or no credit at all, but 
not partial credit. 


The raw score frequency distributions for constructed-response items were computed to identify items with few or 
no observations at any score points. Items with no observations or a low percentage (i.e., less than 3 percent) of 
students obtaining any score point were flagged. In addition, constructed-response items were flagged if they had U- 
shaped distributions, with high frequencies for extreme scores and very low frequencies for middle score categories. 
Items with such response patterns may pose problems during the IRT calibrations and therefore may need to be 
excluded (refer to Section 7 for more information). 


5.4 Summary of Classical Item Analysis Flagging Criteria 


In summary, items are flagged for review if the item analysis yielded any of the following results: 
p-value above .95 for dichotomous items or polytomous items 

p-value below .25 for dichotomous items or polytomous items 

item-total correlation below .15 


any distractor-total correlation above .00 


OT. oo Ne 


greater number of high-performing students (top 20 percent) choosing a distractor rather than the keyed 
response 


6. high percentage of omits: above 5 percent for selected-response items and above 15 percent for 
constructed-response items 


7. high percentage that did not reach the item: above 5 percent for selected-response items and above 15 
percent for constructed-response items 


8. constructed-response items with a score value obtained by less than 3 percent of responses 


Pearson’s psychometric staff carefully reviewed the flagged items and brought items to the Priority Alert Task Force 
to decide if the items were problematic and should be excluded from scoring. 
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5.5 Classical Item Analysis Results 


This section presents tables summarizing the analyses for items on the spring operational forms. The mathematics 
assessments were pre-equated, meaning that the scoring was based on item parameters estimated using data from 
earlier administrations. For the pre-equated grades/subjects, item analysis results in this section are the item 
statistics from prior administrations that were used to make decisions during test construction and for scoring. The 
ELA/L assessments were both pre-equated and post-equated. Therefore, the item analysis results from both prior 
administrations and from the spring operational administration are presented in this section. 


e Tables 5.1 and 5.2 present pre-administration and post-administration p-value information by grade for the 
ELA/L operational items. 


e Table 5.3 presents pre-administration p-value information by grade/course for the mathematics operational 
items. 


e =6Tables 5.4 and 5.5 present pre-administration and post-administration item-total correlations by grade for 
the ELA/L operational items. 


e Table 5.6 presents pre-administration item-total correlations by grade/course for the mathematics 
operational items. 


An operational item may appear on multiple test forms. The tables list unique item counts for an assessment and the 
reported item statistics may be based on student responses across multiple occurrences of an item. 


Spoiled or “do not score” items were excluded from the total test score in item analysis. These items were removed 
from scoring because of item performance, technical scoring issues, content concerns, or multiple/no correct 
answers. 


The fall 2018 forms were based on the spring 2018 operational forms; therefore, the item analyses for these forms 
were reported in the 2017-2018 Technical Report. Some forms on the spring 2019 administration were based on 
spring 2017 and 2018 administrations; therefore, the item analyses for these forms were reported in the 2016-2017 
and the 2017-2018 Technical Reports. 


Table 5.1 Summary of Pre-Administration p-Values for ELA/L Operational Items by Grade 


Giga N of Unique Mean sD Min Max Median 

Items p-Value p-Value p-Value p-Value p-Value 
3 58 0.47 0.17 0.16 0.82 0.47 
4 74 0.47 0.16 0.18 0.86 0.46 
5 66 0.47 0.14 0.09 0.83 0.45 
6 77 0.48 0.15 0.15 0.92 0.48 
7 62 0.48 0.14 0.22 0.83 0.47 
8 72 0.48 0.13 0.20 0.85 0.48 
9 88 0.43 0.13 0.09 0.78 0.41 
10 63 0.42 0.11 0.14 0.64 0.42 
11 62 0.36 0.10 0.16 0.65 0.35 
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Table 5.2 Summary of Post-Administration p-Values for ELA/L Operational Items by Grade 


Gisde N of Unique Mean sD Min Max Median 

Items p-Value p-Value p-Value p-Value p-Value 
3 58 0.47 0.18 0.18 0.84 0.45 
4 74 0.47 0.16 0.21 0.82 0.46 
5 66 0.48 0.14 0.10 0.84 0.46 
6 77 0.49 0.15 0.18 0.91 0.49 
7 62 0.50 0.14 0.24 0.80 0.49 
8 72 0.49 0.13 0.22 0.82 0.49 
9 88 0.44 0.14 0.09 0.77 0.42 
10 63 0.47 0.12 0.15 0.73 0.48 
11 62 0.37 0.10 0.21 0.65 0.36 


Table 5.3 Summary of p-Values for Mathematics Operational Items by Grade/Course 


Grade/ Nof Unique Mean SD Min Max Median 
Course Items p-Value p-Value p-Value p-Value p-Value 
3 77 0.57 0.20 0.19 0.91 0.57 
4 72 0.52 0.20 0.06 0.91 0.51 
5 71 0.50 0.18 0.13 0.84 0.48 
6 69 0.41 0.18 0.11 0.78 0.40 
7 67 0.39 0.17 0.08 0.75 0.37 
8 64 0.33 0.18 0.08 0.68 0.29 
Al 111 0.31 0.16 0.05 0.73 0.30 
GO 118 0.29 0.17 0.05 0.79 0.28 
A2 109 0.27 0.15 0.05 0.82 0.26 
M1 42 0.36 0.16 0.08 0.65 0.39 
M2 41 0.29 0.20 0.05 0.69 0.25 
M3 40 0.27 0.15 0.05 0.61 0.24 


Note: A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = Integrated Mathematics 
ll, M3 = Integrated Mathematics III. 
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Table 5.4 Summary of Pre-Administration Item-Total Correlations for ELA/L Operational Items by Grade 


Gisde ieee Mean sD Min Max. Median 
iene Polyserial Polyserial Polyserial Polyserial Polyserial 
3 58 0.54 0.12 0.23 0.78 0.52 
4 74 0.47 0.14 0.23 0.81 0.45 
5 66 0.48 0.14 0.19 0.83 0.47 
6 77 0.50 0.14 0.20 0.83 0.48 
7 62 0.50 0.15 0.26 0.83 0.48 
8 72 0.49 0.15 0.26 0.83 0.47 
9 88 0.50 0.17 0.25 0.88 0.46 
10 63 0.50 0.17 0.18 0.86 0.47 
11 62 0.47 0.16 0.17 0.85 0.43 


Table 5.5 Summary of Post-Administration Item-Total Correlations for ELA/L Operational Items by Grade 


Grade ius pean sD Min seicaus Median 
itemé Polyserial Polyserial Polyserial Polyserial Polyserial 
3 58 0.56 0.13 0.30 0.81 0.54 
4 74 0.49 0.15 0.15 0.82 0.47 
5 66 0.52 0.15 0.20 0.86 0.51 
6 77 0.53 0.15 0.30 0.87 0.51 
7 62 0.53 0.15 0.26 0.86 0.49 
8 72 0.52 0.16 0.27 0.88 0.49 
9 88 0.51 0.18 0.25 0.88 0.46 
10 63 0.52 0.17 0.19 0.88 0.48 
11 62 0.48 0.18 0.15 0.86 0.44 
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Table 5.6 Summary of Item-Total Correlations for Mathematics Operational Items by Grade/Course 


Grade/ ieee Mean sD Min Max Median 
Course ieenie Polyserial Polyserial Polyserial Polyserial Polyserial 
3 77 0.52 0.13 0.28 0.81 0.52 
4 72 0.52 0.12 0.26 0.76 0.51 
5 71 0.52 0.11 0.20 0.77 0.52 
6 69 0.54 0.14 0.17 0.92 0.54 
7 67 0.51 0.16 0.17 0.82 0.53 
8 64 0.49 0.12 0.24 0.73 0.50 
Al 111 0.45 0.15 0.15 0.75 0.45 
GO 118 0.49 0.16 0.17 0.95 0.48 
A2 109 0.49 0.14 0.19 0.84 0.50 
M1 42 0.50 0.13 0.24 0.75 0.50 
M2 41 0.47 0.15 0.21 0.83 0.46 
M3 40 0.46 0.13 0.18 0.69 0.46 


Note: Al = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics I, 
M2 = Integrated Mathematics Il, M3 = Integrated Mathematics III. 
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Section 6: Differential Item Functioning 


6.1 Overview 


Differential item functioning (DIF) analyses were conducted using the data obtained from the operational items. If an 
item performs differentially across identifiable subgroups (e.g., gender or ethnicity) when students are matched on 
ability, the item may be measuring something other than the intended construct (i.e., possible evidence of DIF). It is 
important, however, to recognize that item performance differences flagged for DIF might be related to actual 
differences in relevant knowledge or skills (item impact) or statistical Type | error. As a result, DIF statistics are used 
to identify potential item bias. Subsequent reviews by content experts and bias/sensitivity committees are required 
to determine the source and meaning of performance differences. 


In this section, the DIF statistics used at test construction to make decisions about items are provided for all 
mathematics online and paper and ELA/L tests. In addition, DIF statistics are presented for the ELA/L online post- 
equated tests. 


6.2 DIF Procedures 


Dichotomous Items 
The Mantel-Haenszel (MH) DIF statistic was calculated for selected-response items and for dichotomously scored 


constructed-response items. In this method, students are classified to relevant subgroups of interest (e.g., gender or 
ethnicity). Using the raw score total as the criteria, students in a certain total score category in the focal group (e.g., 
females) are compared with students in the same total score category in the reference group (e.g., males). For each 
item, students in the focal group are also compared to students in the reference group who performed equally well 
on the test as a whole. The common odds ratio is estimated across all categories of matched student ability using 
the following formula (Dorans & Holland, 1993), and the resulting estimate is interpreted as the relative likelihood of 
success on a particular item for members of two groups when matched on ability. 


R,, W S 


Ou “SRW (61) 


in which: 
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5 =the number of score categories, 
R,.; =the number of students in the reference group who answer the item correctly, 


Ws = the number of students in the focal group who answer the item incorrectly, 


Rys = the number of students in the focal group who answer the item correctly, 
W,.,. = the number of students in the reference group who answer the item incorrectly, and 
Ns = the total number of students. 


To facilitate the interpretation of MH results, the common odds ratio is frequently transformed to the delta scale 
using the following formula (Holland & Thayer, 1988): 


MH D-DIF = -2.35 In(@y,,) (6-2) 


Positive values indicate DIF in favor of the focal group (i.e., positive DIF items are differentially easier for the focal 
group), whereas negative values indicate DIF in favor of the reference group (i.e., negative DIF items are 
differentially easier for the reference group). 


Polytomous Items 

For polytomously scored constructed-response items, the MH D-DIF statistic is not calculated; instead the 
standardization DIF (Dorans & Schmitt, 1991; Zwick et al., 1997; Dorans, 2013), in conjunction with the Mantel chi- 
square statistic (Mantel, 1963; Mantel & Haenszel, 1959), is used to identify items with DIF. 


The standardization DIF compares the item means of the two groups after adjusting for differences in the 
distribution of students across the values of the matching variable (i.e., total test score) and is calculated using the 
following formula: 


STD-EISDIF = F : 
s=l Ny ae Ni 


> (6-3) 


in which: 
X =the total score, 
Y =the item score, 


5 =the number of score categories, 
Ns = the number of students in the reference group in score category s, 


Nx = the number of students in the focal group in score category s, 


E,, = the expected item score for the reference group, and 
Ly = the expected item score for the focal group. 
A positive STD-EISDIF value means that, conditional on the total test score, the focal group has a higher mean item 


score than the reference group. In contrast, a negative STD-EISDIF value means that, conditional on the total test 
score, the focal group has a lower mean item score than the reference group. 
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Classification 
Based on the DIF statistics and significance tests, items are classified into three categories and assigned values of A, 


B, or C (Zieky, 1993). Category A items contain negligible DIF, Category B items exhibit slight to moderate DIF, and 
Category C items possess moderate to large DIF values. Positive values indicate that, conditional on the total score, 
the focal group has a higher mean item score than the reference group. In contrast, negative DIF values indicate 
that, conditional on the total test score, the focal group has a lower mean item score than the reference group. The 
flagging criteria for dichotomously scored items are presented in Table 6.1; the flagging criteria for polytomously 
scored constructed-response items are provided in Table 6.2. 


Table 6.1 DIF Categories for Dichotomous Selected-Response and Constructed-Response Items 


DIF Category Criteria 
a Absolute value of the MH D-DIF is not significantly different from zero, or is less than 
A (negligible) 
one. 
1. Absolute value of the MH D-DIF is significantly different from zero but not from one, 
and is at least one; or 
B (slight to moderate) 2. Absolute value of the MH D-DIF is significantly different from one, but is less than 


1.5. 

Positive values are classified as “B+” and negative values as “B-”. 

Absolute value of the MH D-DIF is significantly different from one, and is at least 1.5. 
Positive values are classified as “C+” and negative values as “C-”. 


C (moderate to large) 


Table 6.2 DIF Categories for Polytomous Constructed-Response Items 


DIF Category Criteria 
A (negligible) Mantel Chi-square p-value > 0.05 or |STD-EISDIF/SD| < 0.17 
B (slight to moderate) Mantel Chi-square p-value < 0.05 and |STD-EISDIF/SD| > 0.17 
C (moderate to large) Mantel Chi-square p-value < 0.05 and |STD-EISDIF/SD| > 0.25 


Note: STD-EISDIF = standardized DIF; SD = total group standard deviation of item score. 


6.3 Operational Analysis DIF Comparison Groups 


DIF analyses were conducted on each test form for designated comparison groups defined on the basis of 
demographic variables including: gender, race/ethnicity, economic disadvantage, and special instructional needs 
such as students with disabilities (SWD) or English learners (EL). Student demographic information was provided by 
the states and district and captured in PearsonAccess"™ by means of a student data upload. The demographic data 
was verified by the states and district prior to score reporting. These comparison groups are specified in Table 6.3. 
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Table 6.3 Traditional DIF Comparison Groups 


Grouping Variable Focal Group Reference Group 
Gender Female Male 
. American Indian/Alaska Native : 
Ethnicity : White 
(Amerindian) 
Asian White 
Black or African American White 
Hispanic/Latino White 
Native Hawaiian or Pacific Islander White 
Multiple Race Selected White 
Economic Status* Economically Disadvantaged (EcnDis) Not Economically Disadvantaged 
(NoEcnDis) 
Special Instructional Needs English Learner (ELY) Non English Learner (ELN) 
Students without Disabilities 


Students with Disabilities (SWDY) (SWDN) 


*Economic status was based on participation in National School Lunch Program (receipt of free or reduced-price 
lunch). 


DIF analyses were conducted when the following sample size requirements were met: 


e the smaller group, reference or focal, had at least 100 students, and 


e the combined group, reference and focal, had at least 400 students. 


6.4 Operational Differential Item Functioning Results 


Appendix 6 presents tables summarizing the DIF results for the spring pre-administration item DIF results that were 
used to inform decisions at test construction for both ELA/L and mathematics, as well as the post-administration 
item DIF results for ELA/L. There is one table prepared for each content and grade level (e.g., ELA/L Grade 3). The fall 
2018 forms were based on spring 2018 operational forms. The DIF analyses for these forms are reported in the 
2017-2018 Technical Report. 


Spoiled or “do not score” items were excluded from the total test score for each form in DIF analysis. These items 
were removed from scoring because of item performance, technical scoring issues, content concerns, multiple 
correct answers, or no correct answers. However, the tables in this section may include items for certain grade levels 
that were excluded from scoring based on later analyses (refer to Section 7.5 Items Excluded from Score Reporting 
for more information). 


In the DIF results tables, the column “DIF Comparisons” identifies the focal and reference groups for the analysis 
performed; “Total N of Unique Items” reports the number of unique items included in the analysis. “Total N of Item 
Occurrences Included in DIF Analysis” reports the number of occurrences with sufficient sample sizes to be included 
in DIF analyses. Because DIF analysis is conducted at the parent level for PCRs in ELA/L tests, the total number of 
unique items reported in the DIF analysis is smaller than the total number of items reported in the classical item 
analysis (see Tables 5.1 and 5.2) and the IRT summary statistics (see Tables 7.7-7.9) for each ELA/L test. In addition, 
“QO” indicates that the DIF analysis did not classify any items in the particular DIF category, while “n/a” indicates that 
the DIF analysis was not performed due to insufficient sample sizes. 
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Table 6.4 Pre-Administration Differential Item Functioning for ELA/L Grade 3 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
DIF Comparison Total N of Unique % of % of % of % of % of 
Items Total Total Total Total Total 
Male vs Female 52 1 2 : ‘ 50 96 1 2 
White vs Black 52 ; ‘ 1 2 51 98 
White vs Hispanic 52 ‘ ‘ 3 6 49 94 
White vs Asian 52 1 2 . ‘ 51 98 
White vs Amerindian 52 : ‘ ; . 52 100 
White vs Pacific Islander 52 F : 1 2 50 96 1 2 
White vs Multiracial 52 : : : , 52 100 
NoEcnDis vs EcnDis 52 P ‘ . & 52 100 
ELN vs ELY 52 ‘ : 2 4 50 96 
SWDN vs SWDY 52 2 F i ‘ 52 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian or Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table 6.5 Differential Item Functioning for Mathematics Grade 3 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
DIF Comparison Total N of Unique % of % of % of % of % of 
Items Total Total Total Total Total 
Male vs Female 77 1 1 75 97 1 1 
White vs Black 77 6 8 68 88 3 4 
White vs Hispanic 77 3 4 74 96 
White vs Asian aw : ‘ 67 87 i) 12 1 1 
White vs Amerindian 77 2 3 75 97 
White vs Pacific Islander 77 1 1 75 97 1 1 
White vs Multiracial 77 1 1 75 97 1 1 
NoEcnDis vs EcnDis 77 , i 77 100 
ELN vs ELY 77 1 1 76 99 
SWDN vs SWDY 77 2 3 75 97 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native Hawaiian 
or Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, ELN = 
not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Section 7: IRT Calibration and Scaling 


7.1 Overview 


Multiple operational core forms were administered for each grade in English language arts/literacy (ELA/L) and 
mathematics assessments. The purpose of the item response theory (IRT) calibration and scaling was to place all 
operational items for a single grade/subject onto a common scale. For the ELA/L computer-based tests (CBTs), the 
IRT parameters were post-equated. This section describes procedures used to calibrate and scale the post-equated 
operational assessments. Because ELA/L paper-based tests (PBTs) and all mathematics tests were pre-equated, 
much of the discussion in this section will not apply; however, the parameters used to construct the conversion 
tables for these tests are presented in this section. 


In this section of the technical report, the following topics related to IRT calibration and scaling are discussed: 


Calibration: 
7.2 IRT Data Preparation 


7.3 Description of the Calibration Process 
7.4 Model Fit Evaluation Criteria 
7.5 Items Excluded from Score Reporting 


Scaling: 

7.6 Scaling Parameter Estimates 

7.7 Items Excluded from Linking Sets 

7.8 Correlations and Plots of Scaling Item Parameter Estimates 
7.9 Scaling Constants 

7.10 Summary Statistics and Distributions from IRT Analyses 


7.2 IRT Data Preparation 


7.2.1 Overview 


The post-equating was based on the majority of students testing in the spring administration. All student response 
data in the samples for operational items were used to create the IRT sparse data matrices for the concurrent 
calibration. IRT sparse data matrices combine student data across forms within administration mode. Items on the 
non-accommodated forms are included in the post-equating analysis. Table 7.1 lists the number of items and 
equating sample size for the post-equated assessments. 
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Table 7.1 Counts and Number of Items in the ELA/L IRT Calibration Files 


Grade Count Items 
3 287,704 46 
4 32,2383 62 
5 331,254 64 
6 332,892 62 
7 325,874 60 
8 322,373 62 
9 120,185 34 
10 183,715 62 
11 33,274 44 


7.2.2 Student Inclusion/Exclusion Rules 


The following are the IRT valid case criteria. These criteria are the same as the student inclusion/exclusion rules used 
to evaluate and filter data prior to conducting the operational item analysis (IA) and differential item functioning 
(DIF) analyses (steps 1-5). 

All records with an invalid form number were excluded. 


All records that were flagged as “void” were excluded. 


Records in which the student attempted fewer than 25 percent of the items in any unit were excluded. 


we IN 


For students with more than one valid record, the record with the higher raw score was chosen. If the raw 
scores were the same, the record with the higher attempted rate across all operational units was chosen. 


5. Records for students with administration issues or anomalies were excluded. 


7.2.3 Items Excluded from IRT Sparse Matrices 


Pearson conducted an initial scoring and key check. Items identified by Pearson as “spoiled” (also referred to as “do 
not use (DNU)”) were listed and excluded from the analyses. When the IRT sparse data matrices were created, all 
items were included in the files unless they were marked as “spoiled” by Pearson. 


7.2.4 Omitted, Not Reached, and Not Presented Items 


In the student data files, some items were identified as omitted, not reached, or not presented items depending on 
the student response data. Item response scores for omits were recoded as “0” in the IRT sparse matrix files unless 
the omitted item were at the end of the test or unit. These items were treated as not reached—items that the 
student probably did not reach or try to answer. Not reached items were counted as missing or no response, and 
therefore did not contribute to the item statistics. 


7.2.5 Quality Control of the IRT Sparse Matrix Data Files 


The IRT sparse data matrices were created by the primary analysts and replicators from Pearson and HumRRO. The 
matrices were checked for quality and accuracy by comparing the number of students (counts), item category 
frequencies, and item statistics (e.g., average item score values) between Pearson and HumRRO. Since the same 
inclusion rules for students were used, all counts, category frequencies, and statistics for all items matched. All 


New Meridian February 28, 2020 Page 57 


2019 Technical Report 


discrepancies in counts were resolved. The programs used to create the IRT statistics were independent, so the QC 
procedure involved parallel computing. Table 7.1 shows the counts and number of items in the CBT IRT sparse data 
matrices for each grade in ELA/L. 


7.3 Description of the Calibration Process 


The IRT calibrations were performed only on the ELA/L CBT tests. The form-to-form linking is established through 
internal and external common items selected during test construction to represent the blueprint. 


7.3.1 Two-Parameter Logistic/Generalized Partial Credit Model 


The operational IRT analyses were conducted by both Pearson and HumRRO. The operational items in the IRT sparse 
data matrix were concurrently calibrated with the two-parameter logistic/generalized partial credit model (2PL/GPC: 
Muraki, 1992). The 2PL/GPC is denoted 


(0,) ex] Yip »Da(6,—b + da)| 
Pin) =o oO 
’ > or Da; —b,+ d,)| 


(7-1) 


where a(6; —b,+ dj) =) Pin (6) is the probability of a student with 0; getting score M on item 1; Dis the IRT 
scale constant (1.7); @; is the discrimination parameter of item 1; 5; is the item difficulty parameter of item 1; dx 
is the k” step deviation value for item 1; AZ; is the number of score categories of item 7 with possible item 


scores as consecutive integers from zero to 47; — 1; Vsequences through each response category through 1/7; —1 


7.3.2 Treatment of Prose Constructed-Response (PCR) Tasks 


The prose constructed-response (PCR) tasks were calibrated at the trait score level (and not as aggregated scores). 
To address the issue of local independence related to PCR items, a single-calibration “model” approach was used. 
When sample sizes were large (i.e., greater than 10,000 students), the data were manipulated using random 
assignment, by selecting one of the two traits for each PCR item for each student. Then one calibration was run so 
that all trait parameters were independently estimated. When sample sizes were smaller (i.e., field-test samples), a 
multiple-calibration “model” approach was used. In this alternative approach, the same data set was calibrated two 
times, each trait represented in one of the two data sets for all students. Then the PCR traits were scaled onto the 
base scale using non-PCR items as anchor items. These two trait calibration approaches addressed the issue of local 
dependence while allowing for the accurate calculation of claim scores and the proper weighting of traits in the 
summative scale scores. 


7.3.3 IRT Item Exclusion Rules (Before Calibration) 


In addition to checking IRT data for accuracy, Pearson conducted item analyses (IA) to identify items that were not 
performing as expected and should be considered for removal from calibration and score reporting. The following 
are the criteria Pearson used to flag extremely problematic items to be dropped from calibration. All “non-spoiled” 
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items were included in the IRT data matrices; however, the IRTPRO calibration software (Cai et al., 2011) control files 


were used to exclude from calibration items flagged for the following reasons: 


A. 
2. 


A weighted polyserial correlation less than 0.0 

An average item score of 0.0 

100 percent of the students having the same item score, such as: 
o 100 percent omitted the item 
o 100 percent received the same score 


© 100 percent of the responses were at the same score after collapsing score categories due to low 
frequencies, or 


© 100 percent of the responses were not presented or not reached 
Insufficient sample sizes for the selected IRT model combinations (i.e., 300 for the 2PL/GPC) 


High omit rates (i.e., greater than 50 percent) on one or more forms (usually an indication that an item may 
not be functioning correctly on all forms) 


A master list of all problematic items before and after calibration was maintained and all flagged and potentially 


flawed items were brought to the Priority Alert Task Force (consisting of New Meridian and participating State Leads 


for member states or agencies) for content and statistical reviews. Ultimately, the decisions about whether to keep 


or exclude an item from score reporting was made by the Priority Alert Task Force. 


7.3.4 IRTPRO Calibration Procedures and Convergence Criteria 


The data were calibrated concurrently across forms using the 2PL/GPC model combination. The primary goal was to 


place the operational item data within each content area and grade/subject on a common difficulty scale. The 


following are the steps used to calibrate the operational item response data: 


1. 


Using the IRT sparse data matrices, concurrent calibrations were conducted using commercially available 
IRTPRO for Windows (version 4.2) on CBT data within each grade/subject. 


IRTPRO Calibration Settings: The logistic partial credit model was specified using the scale constant of 1.0. 
The prior distributions for latent traits were set to a mean of zero and a standard deviation of one. The 
number of quadrature points used in the estimation was set to 49. And the slope starting value was set or 
updated before each run. 


Each IRTPRO run was inspected for convergence and for any unexpected item-parameter estimates. The 
PRIORS command in IRTPRO provided a prior on IRT parameters to constrain the calibration so that 
convergence was more likely. Specifically, option “Guessing[0]” indicated that the prior is placed on the 
lower asymptote for the 3-PL model, and a normal distribution for the priors with mean of -1.4 and 
standard deviation 1. For these items, an inspection of item-level statistics and modal-data fit plots were 
sufficient to ensure that item parameters were acceptable if convergence was reached. Item information 
functions from the IRTPRO output may also be reviewed. Pearson verified that the maximum number of EM 
(expectation-maximization) cycles was not reached (which indicated the program did not converge). 


To convert IRTPRO item parameters to the commonly used logistic parameter presentation (called new 
item parameters), the following formula was used since IRTPRO uses 1.0 for a scaling constant. There was 
no need to transfer b- and c-parameters from IRTPRO output. Please note that all unscaled and scaled item 
parameters were kept on the theta scale. For 2PL models: 
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= Girpro 


new 1 7 
New a-parameter: 2 (7-2) 


5. Pearson reported any need for item-calibration decisions, including convergence issues and extreme 
parameter estimates, along with proposed resolutions, to the Priority Alert Task Force. Anticipated 
resolutions included fixing the slope parameters to a minimum .10 value, fixing the guessing parameter to a 
rational value (1 divided by number of options), and fixing the difficulty parameters at an upper or lower 
bound, depending on the nature of the problem. If extreme b-parameter values were observed (e.g., > 100) 
and the a-parameter values for these items were low (i.e., < 0.10), it was recommended that the prior for 
the a-parameter be set to 0.5. 


6. Dropping an item from further processing or dropping an item and rerunning IRTPRO was performed only if 
it was needed after communication with HumRRO and the Priority Alert Task Force. 


7. Inspection of model-data fit plots was helpful in deciding parameter constraints and acceptability of 
parameter fit. Documentation of each step, after resolution of any issues, was provided by Pearson to New 
Meridian and HumRRO. 


7.3.5 Calibration Quality Control 


To ensure IRT calibrations and conversion tables were produced accurately, HumRRO replicated the IRT calibrations 
and the generation of the score conversion tables. Both Pearson and HumRRO used the same calibration software, 
IRTPRO. Meetings were held, as needed, so that Pearson and HumRRO could provide status reports and discuss 
issues related to the IRT work. Pearson performed quality control comparisons between the Pearson and HumRRO 
item parameter estimates to identify any differences. 


Specifically, the following quality control analyses/comparisons were completed: 


1. Verified all items were treated the same way (i.e., similar score distributions) 


2. Compared IRT item parameter estimates by Pearson and HumRRO (i.e., IRT a-, b-, and d-parameter 
estimates) 


3. Compared the scaling constants for the common item linking sets 
4. Compared scaled CBT parameter estimates generated by Pearson and HumRRO 


5. Compared all conversion tables produced by Pearson and HumRRO 


Exact matches were found between all Pearson and HumRRO conversion tables before scores were reported. 


7.4 Model Fit Evaluation Criteria 


The usefulness of IRT models is dependent on the extent to which they effectively reflect the data. As discussed by 
Hambleton et al. (1991), “The advantages of item response models can be obtained only when the fit between the 
model and the test data of interest is satisfactory. A poorly fitting IRT model will not yield invariant item and ability 


parameters” (p. 53). 
After convergence was achieved for each IRT data set, the IRT model fit was evaluated by doing the following: 


1. Calculating the 0, statistic and comparing it to a criterion score 
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2. Calculating the G» statistic and comparing it to a criterion score 


3. Reviewing graphical output for all items 


The 0? statistic (Yen, 1981) was used as an index of correspondence between observed and expected performance. 


To compute 0, , first the estimated item parameters and student response data (along with observed item scores) 


were used to estimate student ability (0) . Next, expected performance was computed for each item using 


students’ ability estimates in combination with estimated item parameters. Differences between expected item 
performance and observed item performance were then compared at 10 intervals across the range of student 


achievement (with approximately the same number of students per interval). 0, was computed as a ratio involving 


expected and observed item performance. 0, is interpretable as a chi-squared (72) statistic, which can be 


compared to a critical chi-squared value to make a statistical inference about whether the data (observed item 
performance) were consistent with what might be observed if the IRT model was true (expected item performance). 


0, is not directly comparable across different item types because items with different numbers of IRT parameters 
have different degrees of freedom (df). For that reason, a linear transformation (to a Z-score, ZO, ) was applied to 


—; . This transformation also made item fit results easier to interpret and addressed the sensitivity of 0. to sample 


size. 


To evaluate item fit, Yen’s GC statistic was calculated for all items. 0, is a fit statistic that compares observed and 


expected item performance. MAP (maximum a posteriori) estimates from IRTPRO were used as student ability 


estimates. For dichotomous items, 0, was computed as 


x N,(O, —E,) 
O,, => ij \~ ij ij 7-3) 
E,(-E,) 


j=l 


where N,, was the number of students in interval (or group) j for item i, O, was the observed proportion of the 
students for the same cell, and £, was the expected proportions of the students for the same interval. The 


expected proportion was computed as 


fe eed as 
E, == > PG) 
goed (7-4) 


where P(0.) was the item characteristic function for item 7 and students @. The summation is taken over 


students in interval /. 


The generalization of 0, for items with multiple response categories is 
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ee N, -E,,) 
Gen 9, = 5 : (7-5) 
j=l k=l Eig 
where 
Le 
Ey =—)>)P, (62) 
eae (7-6) 


Both OQ, and generalized 0, results were transformed to ZO, and were compared to a criterion ZO, eri to determine 


acceptable fit. The conversion formula was 


7. —, df 
—, = 
v2 (7-7) 


and 


N 


Z _=——x4 
oie 1500 (7-8) 


where df is the degrees of freedom. The degrees of freedom is equal to the number of independent cells less the 


number of independent item parameters. For example, the degrees of freedom for polytomous items equals [10 x 
(number of score categories—1) — number of independent item parameters]. For the GPCM, the number of 
independent item parameters equals 1 (for the a-parameter) plus the number of step values (e.g., for an item scored 
0, 1, 2, 3: there are 3 independent step values—the b-parameter is simply the mean of the step values and is not, 
therefore, independent). 


If 0, is found to be excessively sensitive (i.e., a large number of items are flagged for poor fit, even if their item fit 


plots look reasonable), a likelihood-ratio chi-squared statistic may be computed for each item (Muraki & Bock, 
1997): 


te a oo 
= Ju 
=29 9 74 xn] A 9) 
k=l 


j=l Ni P.O, ) 


where I’, is the observed frequency of the k" categorical response to item 7 in interval j, N ji is the number of 
students in interval j for item l, P, (3,) is the expected probability of observing the k*" categorical response to 


item 7 for the mean @ in interval j,and J is the number of intervals remaining after neighboring intervals are 


merged, if necessary, to avoid expected values, Nj, PG, \ less than 5. To conduct a standard hypothesis test, the 


number of degrees of freedom is equal to the number of intervals, oF , multiplied by m,; —1. 
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As an alternative to a traditional hypothesis test, the “contingency coefficient” (effect size; Barton & Huynh, 2003) 
was computed: 


X? 
y ae ess 


(7-10) 


In this formula, G? was substituted for v2, and N is the sample size on which the IRT parameters were estimated. 


According to Cohen (1988, pp. 224-225), values of C below .10 are considered insignificant, .10+ small, .287+ 
medium, and .447+ large. A threshold of .35 is recommended (i.e., flag items for which C > .35). 


An item fit-plot was created for each item. Item-fit plots show observed and expected average scores for each 
interval. Figure 7.1 is an example of ELA/L five-category item calibrated with the 2 PL/GPC model. This item had an n- 
count of 44,658, Q1=1266.64, ZQ1=147.21 and a criterion ZO, .;; = 237.02. 


12:17 Thursday, May 19,2016 51 


plot of observed and expected probability 


Ttem=51 


Type 


FI Obs 0 Obs LF Obs 2 IK Obs 3 Obs  Exp0 OO Expl 
OOO hep 2 SOO fxp3 OOO Exp 


09+ 
08- 
07+ 
06- 
05- 
04+ 
03- 
02+ 


O1- 


0.0 - 


theta 


Figure 7.1 ELA/L Item Fit Plot: Observed and Expected Probability 
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7.5 Items Excluded from Score Reporting 


As mentioned previously, after calibration and model fit evaluation were completed, a master list of all problematic 
items, if warranted, were brought to the Priority Alert Task Force. The Task Force reviewed each item, its content, 
and the statistical properties, and made decisions about whether to include the item in the operational scores. 
Sometimes, an item was rejected because it appeared to have content issues, and sometimes an item was excluded 
because it had unreasonable IRT parameters or showed extremely poor IRT model fit. Ultimately the decision about 
whether to keep or exclude each flagged item was made by the Task Force. 


7.5.1 Item Review Process 
The following are the types of problematic items that were brought to the Priority Alert Task Force for evaluation 
and an “include or exclude” determination was made: 

e §=Extremely difficult items (e.g., an item with a p-value less than 0.02) 

e Items with low a-parameter estimates (e.g., slope less than 0.10) 

e Items flagged for subgroup DIF 


The primary goal was to minimize the number of items dropped from the operational test forms. An equally 
important goal was to not advantage or disadvantage any students. 


7.5.2 Count and Percentage of Items Excluded from Score Reporting 


All items were calibrated except for 30 items from grade 9 ELA/L and 18 items from grade 11 ELA/L were excluded 
from IRT calibration because these items were unique to some forms that were administered to small groups of 
students. For these items, the prior administration item statistics were more stable and more accurate estimates for 
the item parameters. No items were removed after the IRT calibration. Table 7.2 presents the count and percentage 
of CBT items excluded from IRT calibration along with the reasons the items were excluded. 
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Table 7.2 Number and Percentage of ELA/L Items Excluded from IRT Calibration 


Reason Excluded 


Total n of monery Percent Small 
Stage CBT Items ems Excluded Sample noon pis not Other 
Excluded Size Stats Calibrate 
3 46 0 0% 
4 62 0 0% 
5 64 0 0% 
6 62 0 0% 
7 60 0 0% 
8 62 0 0% 
9 64 30 47% Yes 
10 62 0 0% 
11 62 18 29% Yes 


7.6 Scaling Parameter Estimates 


Year-to-year linking was performed on all ELA/L CBTs to transform IRT parameters to the base IRT scale. The linking 
analyses included common-item sets. The linking methodology was based on the Stocking and Lord (1983) test 
characteristic curve scale transformation method. Year-to-year linking transforms IRT parameters from different 
years (or administrations) onto the same underlying IRT scale. 


HumRRO also used STUIRT (Kim & Kolen, 2004) software to transform their IRTPRO item parameter estimates onto 
the IRTPRO scales for each grade/subject. HumRRO’s scaling constants were compared to those generated by 
Pearson and found to exactly match. 


7.7 Items Excluded from Linking Sets 
Robust Z (Huynh & Meyer, 2010) and Weighted Root Mean Square Difference (WRMSD) were used to identify outlier 
items in the linking sets. The following rules were used to identify items for possible exclusion from the linking sets: 


1. Exclude an item from the common-item set if different amounts of collapsing resulted in a different number 
of response categories. 


2. Flag and potentially exclude an item from the common-item set if the weighted polyserial correlation, 
based on the item analysis, was less than 0.10. 


3. Exclude items dropped by the Priority Alert Task Force (i.e., due to content or parameter estimation issues). 


4. Exclude an item if the scoring rules changed. 


After removing items, if necessary, the following steps were performed: 


1. Implement the Robust Z approach to see if any common items are flagged. 
2. Run the initial Stocking and Lord procedure using the STUIRT software. 


3. Calculate WRMSD and check to see if any common items exceed the threshold. 
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4. Re-run STUIRT after removing the items flagged by Robust Z and WRMSD. 


5. Compare the slopes and intercepts from steps 2 and 4. 


Table 7.3 lists the flagging criteria for the WRMSD. 


Table 7.3 WRMSD Flagging Criteria for Inspection and Possible Removal of Linking Items 


WRMSD/ 
Categories Points Points WRMSD 
2 1 0.100 0.100 
3 2 0.075 0.150 
4 3 0.075 0.225 
5 4 0.075 0.300 
6 5 0.075 0.375 
7 6 0.075 0.450 
>=8 >=7 0.090 0.999 


When inspecting items flagged for exclusion from the linking sets, content representation was also considered to 
avoid removing large numbers of items from the same subclaim. Table 7.4 presents the total number of common 
items, items excluded from the year-to-year linking sets, and items kept in the linking sets for each grade for ELA/L. 
The final number of linking items ranged from 8 (in grade 11) to 28 (in grade 8). Grades 3, 4, and 5 had the largest 
number of items removed from the linking sets due to Robust Z for the a-parameter and b-parameter, some of 
which were also flagged for high WRMSD. 


Table 7.4 Number of ELA/L Items Excluded from the Year-to-Year Linking Sets 


Total n of Final Number of Excluded Items by Reason for Exclusion 
Number F : 
Grade Common Number in Low Robust Z Robust Z High 
Excluded er . 
Items Linking Set Polyserial IRT_a IRT_b WRMSD 
3 24 5 19 0 3 2 0 
4 27 5 22 0 3 2 0 
5 20 5 15 0 1 4 1 
6 28 2 26 0 2 0 0 
7 24 3 21 0 1 2 1 
8 29 1 28 0 1 0 0 
9 14 1 13 0 1 0 0 
10 31 4 27 0 1 3 0 
11 10 2 8 0 1 1 0 


Note: WRMSD did not flag any additional items for removal from the common item sets. 


7.8 Correlations and Plots of Scaling Item Parameter Estimates 


Once the final group of items for each linking set was determined, the a- and b-parameter estimates were plotted 
and the correlation between the a-parameter estimates and the b-parameter estimates were calculated. Table 7.5 
presents the number of linking items, total score points of the linking items, and the correlation of the a- and b- 
parameter estimates across years. 
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Table 7.5 Number of Items, Number of Points, and Correlations for ELA/L Year-to-Year Linking Items 


Number Parameter Correlations 

Grade Items Points a- b- 
3 19 42 0.9776 0.9960 
4 22 49 0.9922 0.9961 
5 15 36 0.9759 0.9981 
6 26 58 0.9932 0.9887 
7 21 48 0.9849 0.9887 
8 28 62 0.9838 0.9929 
9 13 29 0.9894 0.9927 
10 27 60 0.9920 0.9950 
11 8 19 0.9721 0.9602 


Figures 7.2 and 7.3 are a selection of plots of the a- and b-parameter estimates for linking items for the year-to-year 
linking for ELA/L grade 8. For each plot, the x-axis is the original (reference) parameter and the y-axis is the new 
parameter after applying the scaling constants. 
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Figure 7.2 ELA/L Grade 8 Transformed New a- vs. Reference a-Parameter Estimates for Year-to-Year Linking 
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Figure 7.3 ELA/L Grade 8 Transformed New b- vs. Reference b-Parameter Estimates for Year-to-Year Linking 


7.9 Scaling Constants 


Table 7.6 presents the slope and intercept scaling constants for ELA/L for the year-to-year linking, derived from 
STUIRT (Kim & Kolen, 2004) using the Stocking and Lord (1983) test characteristic curve procedure. The slopes and 
intercepts are similar. The slopes range from 0.9835 to 1.1344, and the intercepts range from 0.0446 to 0.3155. 


Table 7.6 Scaling Constants Spring 2018 to Spring 2019 for ELA/L 


Spring 2018 to Spring 2019 
Grade/Subject Slope Intercept 
3 1.0292 0.1130 
4 1.0759 0.1072 
5 1.1013 0.1635 
6 1.1049 0.1744 
7 1.0993 0.1279 
8 1.1344 0.1262 
9 1.0849 0.3002 
10 1.0806 0.3155 
11 0.9835 0.0446 


7.10 Summary Statistics and Distributions from IRT Analyses 


Tables 7.7 through 7.13 present summary statistics for the IRT (b- and a-) parameter estimates, the standard errors 
(SEs) of the parameter estimates, and the IRT model fit values (chi-square and adjusted fit) for ELA/L assessments. 
The summary statistics for IRT parameter estimates include all the items administered in the spring administration 
except the items on the reused forms, if applicable, for which the summary results were reported in the technical 
reports of the source administrations. For ELA/L tests, separate tables were created to display the summary of pre- 
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equated IRT parameter estimates, and the summary of post-equated IRT parameter estimates to reflect the IRT 
parameters of the items being post-equated. The summary statistics for standard errors of the parameter estimates 
and the IRT model fit values are only provided for the post-equated ELA/L items. 


The information is provided by content area (ELA/L and mathematics) for all items at each grade level or course. The 
summary statistics shown include the total number of items and score points, along with the mean, standard 
deviation (SD), minimum, and maximum. 


7.10.1 IRT Summary Statistics for English Language Arts/Literacy 


Table 7.7 shows the pre-equated b- and a-parameter estimates for all ELA/L assessments. Table 7.8 shows the 
source year for the item statistics for each of the ELA/L assessments that were pre-equated. Table 7.9 summarizes 
the b- and a-parameter estimates for the post-equated ELA/L assessments which include post-equated items in 
spring 2019 and pre-equated items. The number of items in Table 7.9 is consistent with Table 7.7. For forms with too 
few student responses or special populations, the item parameters were not post-equated. Table 7.10 presents the 
standard errors (SE) of the post-equated parameters, and Table 7.11 provides model fit information. Only items 
included in the post-equated calibrations are reported in Tables 7.10 and 7.11. IRT summary statistics are provided 
in Appendix 7 for ELA/L for all items, reading-only, and writing-only. 


Table 7.7 Pre-Equated IRT Summary Parameter Estimates for All Items for ELA/L by Grade 


Summary of b Estimates Summary of a Estimates 

No. of 

No. of Score 
Grade Items Points Mean sD Min Max Mean SD Min Max 
3 58 128 0.37 0.97 -1.40 3.13 0.59 0.21 0.16 1.01 
4 74 164 0.24 1.29 -6.48 2.29 0.45 0.22 0.17 1.02 
5 66 145 0.28 1.15 -6.27 2.69 0.49 0.23 0.19 1.06 
6 77 172 0.29 0.92 -1.97 4.45 0.51 0.23 0.20 1.13 
7 62 139 0.22 0.70 -1.33 1.86 0.49 0.24 0.17 1.18 
8 72 159 0.13 0.78 -2.03 2.68 0.47 0.23 0.19 1.12 
9 88 197 0.63 0.79 -1.29 2.95 0.52 0.30 0.17 1.44 
10 63 141 0.62 0.75 -0.54 2.81 0.50 0.28 0.13 1.24 
11 62 139 0.88 0.68 -0.67 2.80 0.46 0.23 0.14 1.10 
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Table 7.8 Pre-Equated IRT Parameter Distribution by Year for All Items for ELA/L by Grade 


Grade ALL 2014 2015 2016 2017 2018 
3 58 0 0 0 21 37 
4 74 0 0 0 13 61 
5 66 0 0 0 29 37 
6 77 0 0 0 27 50 
7 62 0 0 10 24 28 
8 72 0 0 5 27 40 
9 88 0 8 14 9 57 
10 63 0 6 4 26 27 
11 62 2 2 0 20 38 


Table 7.9 Post-Equated IRT Summary Parameter Estimates for All Items for ELA/L by Grade 


Summary of b- Estimates 


Summary of a- Estimates 


No. of 

No. of Score 
Grade Items Points Mean SD Min Max Mean SD Min Max 
3 58 128 0.32 0.91 -1.66 2.05 0.60 0.24 0.22 1.24 
4 74 164 0.16 1.55 -9.56 2.35 0.45 0.23 0.12 0.99 
5 66 145 0.25 1.06 -5.38 2.63 0.49 0.21 0.10 0.96 
6 77 172 0.27 0.87 -1.93 2.95 0.50 0.23 0.18 1.16 
7 62 139 0.16 0.73 -1.34 2.37 0.48 0.25 0.17 1.13 
8 72 159 0.13 0.82 -1.88 2.83 0.47 0.25 0.18 1.19 
9 88 197 0.59 0.80 -1.36 2.95 0.51 0.29 0.14 1.23 
10 63 141 0.59 0.79 -0.93 2.85 0.49 0.27 0.14 1.12 
11 62 139 0.92 0.83 -0.67 4.55 0.46 0.24 0.08 1.10 
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SE of b- Estimates 


SE of a- Estimates 


No. of 
Grade an Score Mean SD Min Max Mean SD Min Max 
Points 

3 46 102 0.006 0.003 0.003 0.017 0.006 0.003 0.003 0.017 

4 62 137 0.004 0.003 0.002 0.016 0.004 0.003 0.002 0.016 

5 64 141 0.005 0.003 0.002 0.014 0.005 0.003 0.002 0.014 

6 62 139 0.004 0.003 0.002 0.017 0.004 0.003 0.002 0.017 

7 60 135 0.005 0.004 0.001 0.019 0.005 0.004 0.001 0.019 

8 62 139 0.005 0.003 0.002 0.019 0.005 0.003 0.002 0.019 

9 34 77 0.006 0.004 0.002 0.021 0.006 0.004 0.002 0.021 

10 62 139 0.005 0.003 0.002 0.017 0.005 0.003 0.002 0.017 

11 44 100 0.012 0.006 0.006 0.029 0.012 0.006 0.006 0.029 

Table 7.11 Post-Equated IRT Model Fit for All Items for ELA/L by Grade 
C Qi 
No. of 
No. of Score 
Grade Items Points Mean SD Min Max Mean sD Min Max 

3 46 102 2732.6 2016.8 385.7 10703.0 2574.6 2037.9 360.1 11583.4 
4 62 137 3584.0 3089.3 163.8 14358.4 3473.5 3003.8 159.2 14072.8 
5 64 141 2920.3 3540.3 151.3 18025.6 2806.2 3505.6 142.6 17306.2 
6 62 139 3284.2 2606.4 289.7 13658.8 3055.4 2407.7 291.5 11996.2 
7 60 135 3436.0 4207.6 148.1 244994 3263.3 4170.4 140.0 26003.2 
8 62 139 3502.7. 3075.6 125.0 14717.3 3296.8 2871.1 123.0 12427.9 
9 34 77 2394.4 2548.5 252.1 13398.9 2225.1 2452.2 226.2 12715.7 
10 62 139 2325.6 1874.8 188.5 8318.2 2220.8 1887.5 183.2 8269.9 
11 44 100 565.9 320.9 105.9 1718.9 514.5 294.2 104.4 1666.5 


7.10.2 IRT Summary Statistics for Mathematics 


Table 7.12 shows the b- and a-parameter estimates for the mathematics assessments. Table 7.13 shows the source 


year for the item statistics for each of the assessments. IRT summary statistics are provided in Appendix 7 for 


mathematics for all items, single-select multiple-choice items, constructed-response items, and subclaims. 
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Table 7.12 IRT Summary Parameter Estimates for All Items for Mathematics by Grade/Course 


Summary of b Estimates 


Summary of a Estimates 


No. of 
No. of Score 
Grade Items Points Mean SD Min Max Mean SD Min Max 
3 77 110 -0.28 0.98 -2.40 1.90 0.79 0.24 0.32 1.33 
4 72 112 -0.15 0.95 -2.61 2.54 0.74 0.20 0.38 1.32 
5 71 116 0.02 0.91 -2.21 1.77 0.73 0.27 0.19 1.57 
6 69 121 0.36 0.89 -3.02 1.98 0.72 0.24 0.20 1.30 
7 67 112 0.75 0.95 -1.03 3.36 0.69 0.29 0.19 1.38 
8 64 115 0.91 0.98 -1.12 2.55 0.61 0.21 0.22 1.29 
Al 111 209 1.27 1.03 -0.96 3.62 0.58 0.27 0.16 1.41 
GO 118 223 1.16 0.94 -1.25 3.83 0.71 0.31 0.19 1.54 
A2 109 218 1.41 0.92 -1.53 3.67 0.65 0.29 0.18 1.34 
M1 42 81 1.02 0.88 -0.64 2.78 0.62 0.23 0.25 1.39 
M2 41 80 1.58 1.30 -0.67 4.68 0.67 0.31 0.17 1.30 
M3 40 81 1.39 0.94 -0.35 3.32 0.57 0.27 0.17 1.27 
Note: Ail = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = Integrated 
Mathematics Il, M3 = Integrated Mathematics III. 
Table 7.13 IRT Parameter Distribution by Year for All Items for Mathematics by Grade/Course 
Grade ALL 2014 2015 2016 2017 2018 
3 77 0 20 10 25 22 
4 72 1 20 18 24 
5 71 0 15 16 31 
6 69 0 12 23 27 
7 67 0 15 14 6 32 
8 64 0 12 12 13 27 
Al 111 0 9 37 27 38 
GO 118 0 23 25 33 37 
A2 109 0 13 20 36 40 
M1 42 0 6 2 21 13 
M2 41 0 10 13 10 8 
M3 40 0 11 10 6 13 
Note: A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = Integrated 
Mathematics II, M3 = Integrated Mathematics III. 
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Section 8: Performance Level Setting 


8.1 Performance Standards 


Performance standards relate levels of performance on an assessment directly to what students are expected to 
learn. This is done by establishing threshold scores that distinguish between performance levels. Performance level 
setting (PLS) is the process of establishing these threshold scores that define the performance levels for an 
assessment. 


8.2 Performance Levels and Policy Definitions 


For the summative assessments, the performance levels are 


e = Level 5: Exceeded expectations 

e = Level 4: Met expectations 

e ~—_ Level 3: Approached expectations 
e ~—_ Level 2: Partially met expectations 


e = Level 1: Did not yet meet expectations 


More detailed descriptions of each performance level, known as policy definitions, are: 


Level 5: Exceeded expectations 
Students performing at this level exceed academic expectations for the knowledge, skills, and practices contained in 


the standards assessed at their grade level or course. 


Grades 3-10: Students performing at this level exceed academic expectations for the knowledge, skills, and 
practices contained in the standards for English language arts/literacy (ELA/L) or mathematics assessed at their 
grade level. They are academically well prepared to engage successfully in further studies in this content area. 


Algebra IL, Integrated Mathematics III, and ELA/L Grade 11: Students performing at this level exceed academic 


expectations for the knowledge, skills, and practices contained in the mathematics and ELA/L standards assessed at 
grade 11. They are very likely to engage successfully in entry-level, credit-bearing courses in mathematics and ELA/L, 
as well as technical courses requiring an equivalent command of the content area. Students performing at this level 
are exempt from having to take and pass placement tests in two- and four-year public institutions of higher 
education designed to determine whether they are academically prepared for such courses without need for 
remediation. 


Level 4: Met expectations 
Students performing at this level meet academic expectations for the knowledge, skills, and practices contained in 


the standards assessed at their grade level or course. 


Grades 3-10: Students performing at this level meet academic expectations for the knowledge, skills, and practices 
contained in the standards for ELA/L or mathematics assessed at their grade level. They are academically prepared 
to engage successfully in further studies in this content area. 
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Algebra Il, Integrated Mathematics II, and ELA/L Grade 11: Students performing at this level meet academic 
expectations for the knowledge, skills, and practices contained in mathematics and ELA/L at grade 11. They are very 


likely to engage successfully in entry-level, credit-bearing courses in mathematics and ELA/L, as well as technical 
courses requiring an equivalent command of the content area. Students performing at this level are exempt from 
having to take and pass placement tests in two- and four-year public institutions of higher education designed to 
determine whether they are academically prepared for such courses without need for remediation. 


Level 3: Approached expectations 
Students performing at this level approach academic expectations for the knowledge, skills, and practices contained 


in the standards assessed at their grade level or course. 


Grades 3-10: Students performing at this level approach academic expectations for the knowledge, skills, and 
practices contained in the standards for ELA/L or mathematics assessed at their grade level. They are likely prepared 
to engage successfully in further studies in this content area. 


Algebra IL Integrated Mathematics III, and ELA/L Grade 11: Students performing at this level approach academic 


expectations for the knowledge, skills, and practices contained in the ELA/L and mathematics standards assessed at 
grade 11. They are likely to engage successfully in entry-level, credit-bearing courses in mathematics and ELA/L, as 
well as technical courses requiring an equivalent command of the content area. Students performing at Level 3 are 
strongly encouraged to continue to take challenging high school coursework in English and mathematics through 
graduation. Postsecondary institutions are encouraged to use additional information about students performing at 
Level 3, such as course completion, course grades, and scores on other assessments to determine whether to place 
them directly into entry-level courses. 


Level 2: Partially met expectations 
Students performing at this level partially meet academic expectations for the knowledge, skills, and practices 


contained in the standards assessed at their grade level or course. 


Grades 3-10: Students performing at this level partially meet academic expectations for the knowledge, skills, and 
practices contained in the standards for ELA/L or mathematics assessed at their grade level. They will likely need 
academic support to engage successfully in further studies in this content area. 


Algebra IL, Integrated Mathematics III, and ELA/L Grade 11: Students performing at this level partially meet 


academic expectations for the knowledge, skills, and practices contained in the ELA/L and mathematics standards 
assessed at grade 11. They will likely need academic support to engage successfully in entry-level, credit-bearing 
courses, and technical courses requiring an equivalent command of the content area. Students performing at this 
level are not exempt from having to take and pass placement tests designed to determine whether they are 
academically prepared for such courses without the need for remediation in two- and four-year public institutions of 
higher education. 


Level 1: Did not yet meet expectations 
Students performing at this level do not yet meet academic expectations for the knowledge, skills, and practices 


contained in the standards assessed at their grade level or course. 


Grades 3-10: Students performing at this level do not yet meet academic expectations for the knowledge, skills, 
and practices contained in the standards for ELA/L or mathematics assessed at their grade level. They will need 
academic support to engage successfully in further studies in this content area. 


New Meridian February 28, 2020 Page 74 


2019 Technical Report 


Algebra Il, Integrated Mathematics III, and ELA/L Grade 11: Students performing at this level do not yet meet 


academic expectations for the knowledge, skills, and practices contained in the ELA/L and mathematics standards 
assessed at grade 11. They will need academic support to engage successfully in entry-level, credit-bearing courses 
in college algebra, introductory college statistics, and technical courses requiring an equivalent level of mathematics. 
Students performing at this level are not exempt from having to take and pass placement tests in two- and four-year 
public institutions of higher education designed to determine whether they are academically prepared for such 
courses without need for remediation. 


8.3 Performance Level Setting Process for the Assessment System 


One of the main objectives of the assessment system is to provide information to students, parents, educators, and 
administrators as to whether students are on track in their learning for success after high school, defined as college- 
and career-readiness. To set performance levels associated with this objective, participating states and agencies 
used the evidence-based standard setting (EBSS) method (Beimers et al., 2012) for the PLS process. The EBSS 
method is a systematic method for combining various considerations into the process for setting performance levels, 
including policy considerations, content standards, educator judgment about what students should know and be 
able to demonstrate, and research to support policy goals related to college- and career-readiness. A defined 
multistep process was used to allow a diverse set of stakeholders to consider the interaction of these elements in 
recommending performance level threshold scores for each assessment. 


The seven steps of the EBSS process that were followed in order to establish performance standards for the 
summative assessments are: 


e Step 1: Define outcomes of interest and policy goals 

e Step 2: Develop research, data collection, and analysis plans 

e Step 3: Synthesize the research results 

e Step 4: Conduct pre-policy meeting 

e Step 5: Conduct performance level setting (PLS) meetings with panels 
e Step 6: Conduct reasonableness review with post-policy panel 

e Step 7: Continue to gather evidence in support of standards 


A summary of key components within these steps is provided below. Additional detail about each step in the PLS 
process is provided in the Performance Level Setting Technical Report. 


8.3.1 Research Studies 


Participating states and agencies conducted two research studies in support of their policy goals—the benchmarking 
study and the postsecondary educators’ judgment (PEJ) study. The benchmarking study included a review of the 
literature relative to college- and career-readiness as well as consideration of the percentage of students obtaining a 
level equivalent to college- and career-readiness on a set of external assessments (e.g., ACT, SAT, NAEP). The PEJ 
study involved a group of nearly 200 college faculty reviewing items on the Algebra II and ELA/L grade 11 
assessments and making judgments about the level of performance needed on each item to be academically ready 
for an entry-level college-credit bearing course in mathematics or ELA/L. Additional detail? about the benchmarking 


2 More information is available online from https://resources.newmeridiancorp.org/research/. 
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study can be found in the Performance Level Setting Technical Report as well as in the PARCC Benchmarking Study 
Report. Additional detail about the PEJ study can be found in the Performance Level Setting Technical Report as well 
as in the Postsecondary Educators’ Judgment Study Final Report. 


8.3.2 Pre-Policy Meeting 


Prior to the PLS meetings, a pre-policy meeting was convened to determine reasonable ranges that would be shown 
to panelists during the high school PLS meetings. Pre-policy meeting participants included representatives from both 
K-12 and higher education who served in roles such as commissioner/superintendent, deputy/assistant 
commissioner, state board member, director of assessment, director of academic affairs, senior policy associate, and 
so on. The reasonable ranges recommended by the pre-policy meeting defined the minimum and maximum 
percentage of students that would be expected to be classified as college- and career-ready. The pre-policy meeting 
participants reviewed the test purpose, how the performance standards will be used, and the results of the research 
studies to provide the recommendations for the reasonable ranges without viewing any student performance data. 


8.3.3 Performance Level Setting Meetings 


The task of the PLS committee was to recommend four threshold scores that would define the five performance 
levels for each assessment. Participating states and agencies solicited nominations from all states that had 
administered the assessments in 2014-2015 for panelists to serve on the PLS committees. Nominations were 
solicited both from state departments of public education (K-12) and higher education (primarily for participation 
on the high school panels). When selecting panelists, an emphasis was placed on those educators who had content 
knowledge as well as experience with a variety of student groups and attempted to balance the panels in terms of 
state representation. 


Participating states and agencies used an extended modified Angoff (Yes/No) method to collect educator judgments 
on the items. This method asked panelists to review each item on a reference form of the assessment and to make 
the following judgment: 


How many points would a borderline student at each performance level likely earn if they answered the question? 


This extension to the Yes/No standard setting method (Plake et al., 2005) allowed for incorporation of the multipoint 
items by asking educators to evaluate (Yes or No) whether a borderline student would earn the maximum number of 
points on an item, a lesser number of points on an item, or no points on the item. In the case of a single point or 
multiple-choice item, this task simplifies to the standard Yes/No method. 


After receiving training on the PLS procedure, panelists participated in three rounds of judgments for each 
assessment. Within each round, panelists were asked to consider the items in the test form, starting with the 
performance-based assessment (PBA) component and then the end-of-year (EOY) component. Each panelist made a 
judgment for the Level 2 performance level, followed by judgments for the Level 3 performance level, the Level 4 
performance level, and the Level 5 performance level, in this order. The panelists entered their item judgments for 
each round by completing an online item judgment survey. Educator judgments were summed across items to 
create an estimated total score on the reference form for each performance level threshold. Feedback data relative 
to panelist agreement, student performance on the items, and student performance on the test as a whole were 
provided in between each of the three rounds of judgment. Panelists were shown the pre-policy reasonable ranges 
prior to making their Round 1 judgments and again as feedback data following each round of judgment. 
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A dry-run of the PLS meeting process was held for grade 11 ELA/L and Algebra II in order to evaluate the 
implementation of the PLS method with the innovative characteristics of the summative assessments. These content 
areas were selected because they combined all the various aspects of the assessments, including the various types 
of items, scoring rules, and performance level decisions. The dry-run PLS meetings provided the opportunity to 
implement and evaluate multiple aspects of the operational plan for the actual PLS meeting, including pre-work, 
meeting materials, data analysis and feedback, and staff and panelist functions. The results of the dry-run PLS 
meeting were used to implement improvements in the process for the operational PLS meetings. Additional 
information about the methods and results of the dry-run PLS meeting is available in the full report in the 
Performance Level Setting Dry-Run Meeting Report. 


The PLS meetings for the summative assessments were conducted during three one-week sessions. The dates of the 
twelve PLS committee meetings that were conducted are shown in Table 8.1. 


Additional information about the methods and results of the PLS meetings is available in the Performance Level 
Setting Technical Report. 


Table 8.1 Performance Level Setting Committee Meetings and Dates 
Dates Committees by Subjects and Grades 


July 27-31, 2015 Algebra I/Integrated Mathematics | 
Geometry/Integrated Mathematics || 
Algebra II/Integrated Mathematics III 


Grade 9 English Language Arts/Literacy 


Grade 10 English Language Arts/Literacy 
Grade 11 English Language Arts/Literacy 


August 17-21, 2015 Grades 7 & 8 Mathematics 
Grades 7 & 8 English Language Arts/Literacy 
August 24-28, 2015 Grades 3 & 4 Mathematics 


Grades 5 & 6 Mathematics 
Grades 3 & 4 English Language Arts/Literacy 


Grades 5 & 6 English Language Arts/Literacy 


8.3.4 Post-Policy Reasonableness Review 


Performance standards for all summative assessments were recommended by PLS committees and reviewed by the 
Governing Board and (for the Algebra II, Integrated Mathematics III, and ELA/L grade 11 assessments) the Advisory 
Committee on College Readiness as part of a post-policy reasonableness review. This group reviewed both the 
median threshold score recommendations from each committee and the variability in the threshold scores as 
represented by the standard error of judgment (SEJ) of the committee. Adjustments to the median threshold scores 
that were within 2 SEJ were considered to be consistent with the PLS panels’ recommendation. 


In addition to voting to adopt the performance standards based on the committees’ recommendations, this group 
also voted to conduct a shift in the performance levels to better meet the intended inferences about student 
performance. Holding the college- and career-ready (or on-track) expectations (i.e., the current level 4) constant, 
performance levels above this expectation were combined and performance levels below this expectation were 
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expanded to create the final system of performance levels with three below and two above the college- and career- 
ready (or on-track) expectation. The shift in performance levels was accomplished using a scale anchoring process 
that involved two primary steps. In the first step, the top two performance levels, above college- and career-ready 
(or on-track), were combined into a single performance level and an additional performance level below college- and 
career-ready (or on-track) was created by empirically determining the midpoint between the existing two levels. In 
the second step, the performance level descriptors (PLDs) were updated using items that discriminated student 
performance well at this level to create a PLD aligned with the new empirically determined performance level. At 
this same time, PLDs for all performance levels were reviewed for consistency and continuity. Members of the 
original PLS committees were recruited to participate in this process. Additional information about this process can 
be found in the Performance Level Setting Technical Report. 
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Section 9: Quality Control Procedures 


Quality control in a testing program is a comprehensive and ongoing process. This section describes procedures put 
into place to monitor the quality of the item bank, test form, and ancillary material development. The quality checks 
for scanning, image editing, scoring, and data screening during psychometric analyses are also outlined. Additional 
quality information can be found in the Program Quality Plan document. 


9.1 Quality Control of the Item Bank 


The summative item bank consists of test passages and items, their associated metadata, and status (e.g., 
operational-ready, field-test ready, released, etc.). The items on the assessments were developed by Pearson and 
West Ed and put in the item bank once created. 


The ABBI bank houses the passages and items, art, associated metadata, rubrics, alternate text for use on 
accommodated forms, and text complexity documentation. It provides an item previewer that allows items to be 
viewed and interacted with in the same way students see and interact with items and tools, and manages versioning 
of items with a date/time stamp. It allows reviewers to vote on item acceptance, and to record and retain their 
review notes for later reconciliation and reference. Item and passage review committee participants conducted their 
review in the item banking system. The committee members viewed the items as the student would, and could vote 
to alter the item, accept or reject the item, and record their comments in the system. After each meeting, reports 
were forwarded to New Meridian. The reports were generated by the item banking system and summarized 
feedback from the committee reviewers. 


All new development for the summative assessments is being created within the ABBI system, which employs 
templates to control the consistency of the underlying scoring logic and QTI creation for each item type. The ABBI 
system incorporates a previewer that allows the reviewers to validate the content of the item and validate the 
expected scoring of tasks. It supports the full range of review activities, including content review, bias and sensitivity 
review, expert editorial review, data review, and test construction review. It provides insight into the item edit 
process through versioning. A series of metadata validations at key points in the development cycle provide support 
for metadata consistency. The bank can be queried on the full range of metadata values to support bank analysis. 


9.2 Quality Control of Test Form Development 


Test forms were built based upon targets and the established blueprints set. The construction process started with 
specification and requirement capture to create the test specification document. From there items were pulled into 
forms based on the criteria approved in the test specifications document. After forms composition, the forms went 
through a review process that involved groups from New Meridian, Pearson and participating states. Quality control 
steps were conducted on the items and forms evaluating several item characteristics (e.g., content accuracy, 
completeness, style guide conformity, tools function). Revisions were incorporated into the forms before final 
review and approval. Section 2.2 provides more details on the form development process. 


The forms quality assurance was performed by Pearson’s Assessment and Information Quality (AlQ) organization. 
AlQ completed a comprehensive review of all online forms for the administration cycle. This group is part of 
Pearson’s larger Organizational Quality group and operates exclusively to validate form operability. The group 
validates that the functionality of every online form is working to specifications. The overall functionality and 
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maneuverability of each form is checked, and the behavior of each item within the form is verified. (Quality 
processes for paper forms are described in Section 9.3.) 


The items within each form were tested to verify that they operated as expected for students. As a further aspect of 
the testing process, AlIQ confirmed that forms were loaded correctly and that the audio was correct when compared 
to text. Sections and overviews were reviewed. Technology-enhanced items also were tested as an additional 
measure. As enumerated in the Technology Guidelines for Assessments, user interfaces were compatible with a 
range of common computer devices, operating systems, and browsers. 


Pearson also performed QC tests to verify that a standard set of responses was outputted to the XML as expected 
after the final version of the form was approved. These responses were based on the keys provided in the test map 
or a standard open-ended (OE) responses string that contained a valid range of characters. The test maps also were 
validated against the form layout and item types for correctness as part of these tests. 


Pearson conducted a multifaceted validation of all item layout, rendering, and functionality. Reviewers conducted 
comparisons between the approved item and the item as it appeared in the field-test form or how it previously 
appeared, validated that tools and functions in the test delivery system, TestNav, were accurately applied, and 
verified that the style and layout met all requirements. In addition, answer keys were validated through a formal key 
review process. More details on the test development procedures are provided in Section 2. 


9.3 Quality Control of Test Materials 


Pearson provided high quality materials in a timely and efficient manner to meet the test administration needs. 
Since the majority of printing work was done in-house, it was possible to fully control the production environment, 
press schedule, and quality process for print materials. Additionally, strict security requirements were employed to 
protect secure materials production; Section 3 provides details on the secure handling of test materials. Materials 
were produced according to the style guide and to the detailed specifications supplied in the materials list. 


Pearson Print Service operates within the sanctions of an ISO 9001:2008 Quality Management System, and practices 
process improvement through Lean principles and employee involvement. 


Raw materials (paper and ink) used for scannable forms production were manufactured exclusively for Pearson Print 
Service using specifications created by Pearson Print Service. Samples of ink and paper were tested by Pearson prior 
to use in production. Project specialists were the point of contact for incoming production. 


Purchase orders and other order information were assessed against manufacturing capabilities and assigned to the 
optimal production methodology. Expectations, quality requirements, and cost considerations were foremost in 
these decisions. Prior to release for manufacture, order information was checked against specifications, technical 
requirements, and other communication that includes expected outcomes. Records of these checks were 
maintained. 


Files for image creation flow through one of two file preparation functions: digital pre-press (DPP) for digital print 
methodology, or plateroom for offset print methodology. Both the DPP and plateroom functions verify content, file 
naming, imposition, pagination, numbering stream, registration of technical components, color mapping, workflow, 
and file integrity. Records of these checks are created and saved. 
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Offset production requires printing that uses a lithographic process. Offline finishing activities are required to create 
books and package offset output. Digital output may flow through an inkjet digital production line (DPL) or a sheet- 
fed toner application process in the Xpress Center. A battery of quality checks was performed in these areas. The 
checks included color match, correct file selection, content match to proof, litho-code to serial number 
synchronization, registration of technical components, ink density controlled by densitometry, inspection for print 
flaws, perforations, punching, pagination, scanning requirements, and any unique features specified for the order. 
Records of these checks and samples pulled from planned production points were maintained. Offline finishing 
included cutting, shrink-wrapping, folding, and collating. The collation process has three robust inline detection 
systems that inspected each book for: 


e = Caliper validation that detects too few or too many pages. This detector will stop the collator if an incorrect 
caliper reading is registered. 


e = Anoptical reader that will only accept one sheet. Two or zero sheets will result in a collator stoppage. 


e The correct bar code for the signature being assembled. An incorrect or upside down signature will be 
rejected by the bar code scanner and will result in a collator stoppage. 


Pearson’s Quality Assurance (QA) department personnel inspected print output prior to collation and shipment. QA 
also supported process improvement, work area documentation, audited process adherence, and established 
training programs for employees. 


9.4 Quality Control of Scanning 


Establishing and maintaining the accuracy of scanning, editing, and imaging processes is a cornerstone of the 
Pearson scoring process. While the scanners are designed to perform with great precision, Pearson implements 
other quality assurance processes to confirm that the data captured from scan processing produce a complete and 
accurate map to the expected results. 


Pearson pioneered optical mark reading (OMR) and image scanning, and continues to improve in-house scanners for 
this purpose. Software programs drive the capture of student demographic data and student responses from the 
test materials during scan processing. Routinely scheduled maintenance and adjustments to the scanner 
components (e.g., camera) maintain scanner calibration. Test sheets inserted into every batch test scanner accuracy 
and calibration. 


Controlled processes for developing and testing software specifications included a series of validation and 
verification procedures to confirm the captured data can be mapped accurately and completely to the expected 
results and that editing application rules are properly applied. 


9.5 Quality Control of Image Editing 


The final step in producing accurate data for scoring is the editing process. Once information from the documents 
was Captured in the scanning process, the scan program file was executed, comparing the data captured from the 
student documents to the project specifications. The result of the comparison was a report (or edit listing) of 
documents needing corrections or validation. Image Editing Services performed the tasks necessary to correct and 
verify the student data prior to scoring. 
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Using the report, editors verified that all unscanned documents were scanned, or the data were imported into the 
system through some other method such as flatbed scan or key entry. 


Documents with missing or suspect data were pulled, verified, and corrections or additional data were entered. 
Standard edits included: 


e Incorrect or double gridding 
e = Incorrect dates (including birth year) 
e Mismatches between pre-ID label and gridded information 


e Incomplete names 


When all edits were resolved, corrections were incorporated into the document file containing student records. 


Additional quality checks were also performed. These included student n-count checks to make certain: 
e students were placed under the correct header, 
e all sheets belonged to the appropriate document, 
e documents were not scanned twice, and 
e no blank documents existed. 


Finally, accuracy checks were performed by checking random documents against scanned data to verify the accuracy 
of the scanning process. 


Once all corrections were made, the scan program was tested a second time to verify all data were valid. When the 
resulting output showed that no fields were flagged as suspect, the file was considered clean and scoring began. 
Once all scanning was completed, the right/wrong response data were securely handed off. 


9.6 Quality Control of Answer Document Processing and Scoring 


Quality control of answer document processing and scoring involves all aspects of the scoring procedures, including 
key-based and rule-based machine scoring and handscoring for constructed-response items and performance tasks. 


For the 2015 operational administration, Pearson’s validation team prepared test plans used throughout the scoring 
process. Test plan preparation was organized around detailed specifications. 


Based on lessons learned from previous administrations, the following quality steps were implemented: 


e Rawscore validation (e.g., score key validation; evidence statement, field-test non-score; double-grid 
combinations; possible correct combination, if applicable; out-of-range/negative test cases) 


e Matching (e.g., validation of high-confidence criteria, low-confidence criteria, cross document, external or 
forced matching by customer; prior to and after data updates; extract file of matched and unmatched 
documents) 


e Demographic update tests (e.g., verification of data extract against corresponding layout; valid values for 
updatable fields; invalid values for updatable/non-updatable fields; negative test for non-existing record or 
empty file) 
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The following components were added to the quality control process specifically for the program. These additional 


steps were introduced to address issues with item-level scoring that were identified in the 2014 field-test 


administration: 


XML Validation: A combination of automated validation against 100 percent of item XMLs and human 
inspection of XML from selected difficult item types or composite items. 


Administration/End-to-End Data Validation: An automated generation of response data from approved test 
maps that have known conditions against the operational scoring systems and data generation systems to 
verify scoring accuracy. 


Psychometric Validation: Verification of data integrity using criteria typically used in psychometric processes 
(e.g., statistical keychecks) and categorization of identified issues to help inform investigation by other 
groups. 


Content Validation: An examination, by subject matter experts, of all items using a combination of 
automated tools to generate response and scoring data. 


In addition to the steps described above, the following quality control process for answer keys and scoring that was 


implemented for the first operational administration was used: 


1. 


Pearson’s psychometrics team conducted empirical analyses based on preliminary data files and flagged 
items based on statistical criteria; 


Pearson content team reviewed the flagged items and provided feedback on the accuracy of content, 
answer keys, and scoring; 


Items potentially requiring changes were added to the product validation (PV) log for further investigation 
by other Pearson teams; 


Staff was notified of items for which keys or scoring changes were recommended; 
Participating states and agencies approved/rejected scoring changes; and 


All approved scoring changes were implemented and validated prior to the generation of the data files used 
for psychometric processing. 


9.7 Quality Control of Psychometric Processes 


High quality psychometric work for the operational administrations was necessary to provide accurate and reliable 


results of student performance. Pearson and HumRRO implemented quality control procedures to ensure the quality 


of the work including: 


1 
2 
3 
4. 
5 
6 


Well-defined psychometric specifications 
Consistently applied data cleaning rules 
Clear and frequent communication 

Test run analyses 

Quality checks of the analyses 


Checklists for statistical procedures 
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9.7.1 Pearson Psychometric Quality Control Process 


Pearson was responsible for the psychometric analyses of the operational administration and implemented 
measures to ensure the quality of work. The psychometric analyses were all conducted according to well-defined 
specifications. Data cleaning rules were clearly articulated and applied consistently throughout the process. Results 
from all analyses underwent comprehensive quality checks by a team of psychometricians and data analysts. 
Detailed checklists were used by members of the team for each statistical procedure. 


Described below is an overview of the quality control steps performed at different stages of the psychometric 
analyses. Greater detail is provided in Sections 5 (Classical Item Analysis), 6 (Differential Item Functioning), 7 (IRT 
Calibration and Scaling), and 12 (Scale Scores). 


Data Screening 
Data screening is an important first step to ensure quality data input for meaningful analysis. The Pearson Customer 


Data Quality (CDQ) team validated all student data files used in the operational psychometric analyses. The data 
validation for the student data files (SDF) and item response files (IRF) included the following steps: 


1. Validated variables in the data file for values in acceptable ranges. 


2. Validated that the test form ID, unique item numbers (UINs), and item sequence on the data file were 
consistent with the test form values on the corresponding test map. 


3. Computed the composite raw score, claim raw scores, and subclaim raw scores, given the item scores in the 
student data file. 


4. Compared computed raw scores to the raw scores in the student data file. 
5. Compared the student item response block (SIRB) to the item scores. 


6. Flagged student records with inconsistencies for further investigation. 


Pearson Psychometrics and HumRRO established predefined valid case criteria, which were implemented 
consistently throughout the process. Refer to Section 5.2 for rules for inclusion of students in analyses and Section 
7.2 for IRT calibration data preparation criteria and procedures. 


Classical Item Analysis 
Classical item analysis (IA) produces item level statistics (e.g., item difficulty and item-total correlations). The IA 


results were reviewed by Pearson psychometricians. Items flagged for unusual statistical properties were reviewed 
by the content team. A subset of items identified as having key issues, scoring issues, or content issues was 
presented to the Priority Alert Task Force, which made decisions on whether to exclude them from the IRT 
calibration process and, consequently, the calculation of reported student scores. Refer to Section 5.4 for classical IA 
item flagging criteria. 


Calibrations 
Creation of item response theory (IRT) sparse data matrices is an important step before the calibrations can begin. 


Using the same scored item response data, Pearson and HumRRO teams filtered the data and generated their own 
sparse data matrices independently. Processing of all data was done in parallel by two psychometricians and 
compared for number of students. This verification of the data preparation was important to ensure that student 
exclusion rules were applied consistently across the analyses. 


During the calibration process, checks were made to ensure that the correct options for the analyses were selected. 
Checks were also made on the number of items, number of students with valid scores, IRT item difficulties, standard 
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errors for the item difficulties, and the consistency between selected IRT statistics to the corresponding statistics 
obtained during item analyses. Psychometricians also performed detailed reviews of statistics to investigate the 
extent to which the assumptions of the model fit the observed data. Refer to Section 7.4 for IRT model fit evaluation 
criteria. 


Scaling 
During the scaling process, checks were made on the number of linking items, the number of items that were 


excluded from linking during the stability check of the scaling process, and the scaling constants. Linking items that 
did not meet the anchor criteria were excluded as linking items. Additionally, items with large weighted root mean 
square difference (WRMSD) values in Round 1 of scaling were excluded as linking items in Round 2. Finally, reviewers 
computed the linking constants and then checked that the linking constants were correctly applied. Refer to Section 
7.6 for a description of the scaling process. 


Conversion Tables 
Conversion tables must be accurate because they are used to generate reported scores for students. Comprehensive 


records were meticulously maintained on item-level decisions, and thorough checks were made to ensure that the 
correct items were included in the final score. All conversion tables were processed in parallel by Pearson and 
HumRRO and completely matched. A reasonableness check was also conducted by psychometricians for each 
content and grade level to make sure the results were in alignment with observations during the analyses prior to 
conversion table creation. Refer to Section 12.3 for the procedure to create conversion tables. 


Delivering Item Statistics 
Item statistics based on classical item analyses and IRT analyses were obtained during the psychometric analysis 


process. The statistics were compiled by two data analysts independently to ensure that the correct statistics were 
delivered for the item bank. 


9.7.2 HumRRO Psychometric Quality Control Process 


HumRRO served as the psychometric replicator for the operational administration. HumMRRO replicated the IRT 
analyses, scaling analyses, and the conversion file creations. The following steps outline the replication process: 


1. Calibrated online data. 
Sent the item parameter estimates and scaling constants to Pearson for comparison. 


Reconciled differences, if any, in results with Pearson. 


2 

3 

4. Sent data files to Pearson for comparison and reconciled differences, if any. 

5. Generated the performance levels, summative, claim, and subclaim conversion tables. 
6 


Sent conversion tables to Pearson for comparison and reconciled differences, if any. 
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Section 10: Operational Test Forms 


Each operational test form is constructed to reflect the full New Meridian blueprint. Multiple operational forms are 
constructed for each grade/subject. Core forms are the operational test forms consisting of only those items that will 
count toward a student’s score. Core forms are constructed to meet the blueprint and psychometric properties 
outlined in the test construction specifications. New Meridian creates multiple core forms for a given assessment to 
enhance test security and to support opportunity for item release. The number of core operational forms per 
grade/subject and mode is provided in Table 10.1. 


Table 10.1 Number of Core Operational Forms per Grade/Subject and Mode for ELA/L and Mathematics 


Gicaersauice: ELA/L Mathematics 
CBT PBT CBT PBT 

Grade 3 2 1 2 1 
Grade 4 2 1 2 1 
Grade 5 2 1 2 1 
Grade 6 2 1 2 1 
Grade 7 2 1 2 1 
Grade 8 2 1 2 1 
Grade 9 2 1 

Grade 10 2 1 

Grade 11 2 1 

Algebra | 2 1 
Geometry 2 1 
Algebra II 2 1 
Integrated Mathematics | 1 1 
Integrated Mathematics II 1 1 
Integrated Mathematics III 1 1 


CBT = computer-based test; PBT = paper-based test 


In addition to the operational core forms, appropriate forms were identified as accessibility and accommodated 
forms. Grades 3-11 ELA/L and Integrated Mathematics |, Il, and III have two operational accommodated forms and 
mathematics grades 3-8 and the high school traditional assessments have three accommodated forms. The forms 
are accommodated to support Braille, large print, human reader/human signers, assistive technology, text-to- 
speech, closed captioning, and Spanish. Human reader/human signers and Spanish are provided for mathematics 
assessments only. Closed captioning is provided for ELA/L assessments only. 


The summative assessments were administered in either a computer-based test (CBT) or a paper-based test (PBT) 
format. ELA/L assessments focused on writing effectively when analyzing text. Mathematics assessments focused on 
applying skills and concepts, and featured multi-step problems that require abstract reasoning and modeling of real- 
world problems. In both content areas, students also demonstrated their acquired skills and knowledge by 
answering selected response items and fill-in-the-blank questions. Each assessment was comprised of multiple units; 
one of the mathematics units was split into calculator and non-calculator sections. 
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Section 11: Student Characteristics 


11.1 Overview of Test Taking Population 


Approximately half a million students from the District of Columbia, Department of Defense Education Activity, and 
Maryland participated in the operational administration of the summative assessments during the 2018-2019 school 
year. Not all participating states and agencies had students testing in all grades. Assessments were administered for 
English language arts/literacy (ELA/L) in grades 3 through 11; mathematics assessments were administered in grades 
3 through 8, as well as for traditional high school mathematics (Algebra |, Geometry, and Algebra II) and integrated 
high school mathematics (Integrated Mathematics |, Il, and III). However, no students took Integrated Mathematics 
lll. A small subset of students tested in ELA/L grades 9, 10, and 11, and Algebra |, Geometry, and Algebra II during fall 
of 2018. Student characteristics for this group are presented in an addendum. The majority of students tested during 
the spring administration when all grades and content areas were administered online and on paper. 


11.2 Rules for Inclusion of Students in Analyses 


Criteria for inclusion of students were implemented prior to all operational analyses. These rules were established by 
Pearson psychometricians in consultation with participating states and agencies to determine which, if any, student 
records should be removed from analyses. This data screening process resulted in higher quality, albeit slightly 
smaller, data sets. 


Student response data were included in analyses if: 


1. Valid form numbers were observed for each unit for online assessments or for the full form for paper 
assessments, 


2. Student records were not flagged as “void” (i.e., do not score), and 


3. The student attempted at least 25 percent of the items in each unit or form. 


Additionally, in cases where students had more than one valid record, the record with the higher raw score was 
chosen. Records for students with administration issues or anomalies were excluded from analyses. 
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11.3 Students by Grade/Course, Mode, and Gender 


Table 11.1 presents, for each grade of ELA/L, the number and percentage of students who took the test in each 
mode (CBT or PBT). This information is provided for all participating states combined. Table 11.2 presents the same 
type of information for all students who took the mathematics assessments, and Table 11.3 provides this 
information for students who took the mathematics assessments in Spanish. 


Markedly more students tested online than on paper across all grades for both content areas. For ELA/L, the 
percentages of online students by grade level, for all states combined, ranged from 99.8 percent to 100 percent, 
while the percentages of paper test students ranged from O percent to .2 percent. For all mathematics students, the 
percentages of students testing online ranged from 99.3 percent to 100 percent, whereas the percentages of 
students testing on paper ranged from 0 percent to .7 percent. The percentages of students taking Spanish-language 
mathematics online forms ranged from 41.3 percent to 100 percent and the percentages of students taking Spanish- 
language mathematics paper forms ranged from 0 percent to 58.7 percent. No testers took Integrated Mathematics 
Ill in the spring administration. 


Table 11.1 ELA/L Students by Grade and Mode: All States Combined 


No. of Valid CBT PBT 
Grade Cases N % N % 
3 72,442 72,319 99.8 123 0.2 
4 74,167 74,069 99.9 98 0.1 
5 75,421 75,297 99.8 124 0.2 
6 78,755 78,665 99.9 90 0.1 
7 75,084 74,990 99.9 94 0.1 
8 72,619 72,554 99.9 65 0.1 
9 3,388 3,388 100.0 n/r n/r 
10 73,487 73,325 99.8 162 0.2 
11 60 60 100.0 n/r n/r 
Grand Total 525,423 524,667 99.9 756 0.1 


Note: Includes students taking accommodated forms of ELA/L. CBT = computer-based test; PBT = paper-based test; 
n/r = not reported due to n<20. 
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Table 11.2 Mathematics Students by Grade/Course and Mode: All States Combined 


No. of Valid CBT PBT 

Grade/Course Gates N % N % 
3 79,070 78,922 99.8 148 0.2 
4 80,595 80,492 99.9 103 0.1 
5 81,489 81,339 99.8 150 0.2 
6 78,665 78,572 99.9 93 0.1 
7 62,845 62,759 99.9 86 0.1 
8 39,807 39,763 99.9 44 0.1 
Al 84,749 84,161 99.3 588 0.7 
GO 13,390 13,378 99.9 n/r n/r 
A2 5,129 5,128 100.0 n/r n/r 
M1 144 144 100.0 n/r n/r 
M2 190 190 100.0 n/r n/r 
M3 n/a n/a n/a n/a n/a 
Grand Total 526,073 524,848 99.8 1,225 0.2 


Note: Includes students taking mathematics in English, students taking 

Spanish-language forms for mathematics, and students taking accommodated 

forms. CBT = computer-based test; PBT = paper-based test; A1 = Algebra |, GO = Geometry, A2 = Algebra Il, M1 = 
Integrated Mathematics |, M2 = Integrated Mathematics II, M3 = Integrated Mathematics III; n/a = not applicable; 
n/r = not reported due to n<20. 


Table 11.3 Spanish-Language Mathematics Students by Grade/Course and Mode: All States Combined 


No. of Valid CBT PBT 

Grade/Course ae N % N % 
3 139 138 99.3 n/r n/r 
4 147 147 100.0 n/r n/r 
5 152 152 100.0 n/r n/r 
6 112 110 98.2 n/r n/r 
7 139 138 99.3 n/r n/r 
8 149 149 100.0 n/r n/r 
Al 705 291 41.3 414 58.7 
GO 59 59 100.0 n/r n/r 
A2 n/r n/r n/r n/r n/r 
M1 n/r n/r n/r n/r n/r 
M2 n/r n/r n/r n/r n/r 
M3 n/a n/a n/a n/a n/a 
Grand Total 1,602 1,184 73.9 418 26.1 


Note: CBT = computer-based test; PBT = paper-based test; Al = Algebra |, GO = Geometry, A2 = Algebra Il, M1 = 
Integrated Mathematics |, M2 = Integrated Mathematics II, M3 = Integrated Mathematics III; n/a = not applicable; 
n/r = not reported due to n<20. 
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Tables A.11.1, A.11.2, and A.11.3 in Appendix 11 show the number and percentage of students with valid test scores 
in each content area (including Spanish-language mathematics), grade/course, and mode of assessment for all states 
and agencies combined and for each state or agency separately. Tables A.11.4, A.11.5, and A.11.6 present the 
distribution by content area, grade/course, mode, and gender, for all states combined. 


11.4 Demographics 


Also presented in Appendix 11 is student demographic information for the following characteristics: economically 
disadvantaged, students with disabilities, English learners (EL), gender, and race/ethnicity (American Indian/Alaska 
Native; Asian; Black/African American; Hispanic/Latino; White/Caucasian; Native Hawaiian or Other Pacific Islander; 
two or more races reported; race not reported). Student demographic information was provided by the states and 
districts and captured in PearsonAccess"™' by means of a student data upload. The demographic data was verified by 
the states and districts prior to score reporting. 


Tables A.11.7 through A.11.15 provide demographic information for students with valid ELA/L scores, and Tables 
A.11.16 through A.11.26 present demographics for students with valid mathematics scores. All tables of 
demographic information are organized by grade/course; the results are first aggregated across all participating 
states and agencies and then presented for each state or agency. Percentages are not reported in which fewer than 
20 students tested in a grade/course area. 


New Meridian February 28, 2020 Page 90 


2019 Technical Report 


Section 12: Scale Scores 


Participating states and agencies report results according to five performance levels that delineate the knowledge, 
skills, and practices students are able to demonstrate: 


e = Level 5: Exceeded expectations 

e = Level 4: Met expectations 

e = Level 3: Approached expectations 
e —_ Level 2: Partially met expectations 


e = Level 1: Did not yet meet expectations 


The assessments are designed to measure and report results in categories called master claims and subclaims. 
Master claims (or simply “claims”) are at a higher level than subclaims with content representing multiple subclaims 
contributing to each claim outcome. In addition, four scale scores are reported for the assessments. ? A summative 
scale score is reported for each mathematics assessment. A summative scale score and separate claim scores for 
Reading and Writing are reported for each English language arts/literacy (ELA/L) assessment. 


Subclaim outcomes describe student performance for content-specific subsets of the item scores contributing to a 
particular claim. For example, Written Expression and Knowledge of Conventions subclaim outcomes are reported 
along with Writing claim scores. Subclaim outcomes are reported as Below Expectations, Nearly Meets Expectations, 
or Meets or Exceeds Expectations. 


12.1 Operational Test Content (Claims and Subclaims) 


Aclaim is a statement about student performance based on how students respond to test questions. The tests are 
designed to elicit evidence from students that supports valid and reliable claims about the extent to which they are 
college and career ready or on track toward that goal and are making expected academic gains based on the 
Common Core State Standards (CCSS). 


The number of items associated with each claim and subclaim outcome varies depending on subject and grade. The 
item types vary in terms of the number of points associated with them, so that both the number of items and the 
number of points are important in evaluating the quality of a claim or subclaim score. 


12.1.1 English Language Arts/Literacy 


Table 12.1% includes the number of items and the number of points by subclaim and claim for ELA/L grade 3. 
Corresponding information is provided in Appendix 12.1 for all ELA/L grades. 


3 Addendum 12 presents a summary of results on scale scores for the fall 2018 administration. 
4 Table A.12.1 in Appendix 12.1 is identical to Table 12.1. 
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Table 12.1 Form Composition for ELA/L Grade 3 


Claims Subclaims Number of Items Number of Points 
Reading 
Reading Literary Text 9-12 19-25 
Reading Informational Text 9-12 19-25 
Vocabulary 4-7 8-14 
Claim Total 22 46 
Writing 
Written Expression 2 27 
Knowledge of Conventions 1 9 
Claim Total 3 36 
SUMMATIVE TOTAL 23 82 


Note: Each prose constructed-response (PCR) trait is identified as a separate item in this table for the two writing 
subclaims and, in some cases, either the Reading Literary Text or the Reading Informational Text subclaim. 


Each ELA/L form contains items of varying types. The prose constructed-response (PCR) traits contribute to different 
claims and the aggregate of the traits contributes to the summative scale score. The following details the number of 
possible points and the associated subclaims for the three PCR tasks: 


e —_ Literary Analysis Task 
e Research Simulation Task 


e =6 Narrative Writing Task 


The Literary Analysis Task and the Research Simulation Task are scored for two traits: Reading Comprehension and 
Written Expression, and Knowledge of Conventions. The Narrative Writing Task is scored for two traits: Written 
Expression and Knowledge of Conventions. All traits are initially scored as either O—3 or 0-4; the Written Expression 
traits are multiplied by 3 (or weighted) to increase their contribution to the total score, making possible subclaim 
scores 0, 3, 6, and 9, or 0, 3, 6, 9, and 12. The maximum possible points for ELA/L PCR items are provided in Table 
12.2. 


Table 12.2 Contribution of Prose Constructed-Response Items to ELA/L 


Possible Points 


Grade Score Literary Analysis Research Simulation Narrative Writing 
Task Task Task 

3 Reading 3 3 0 
Written Expression 9 9 9 
Knowledge of Conventions 3 3 3 
Total 15 15 12 
4-5 Reading 4 4 0 
Written Expression 12 12 9 
Knowledge of Conventions 3 3 3 
Total 19 19 12 
6-11 Reading 4 4 0 
Written Expression 12 12 12 
Knowledge of Conventions 3 3 3 
Total 19 19 15 
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12.1.2 Mathematics 


Table 12.3° includes the numbers of items and points associated with subclaim scores for mathematics grade 3, as 
an example of the composition of the mathematics tests. 


Table 12.3 Mathematics Form Composition for Grade 3 


Subclaims Number of Items Number of Points 
Mathematics 
Major Content 26 30 
Additional & Supporting Content 10 10 
Expressing Mathematical Reasoning 4 14 
Modeling and Applications 3 12 
TOTAL 43 66 


Because there is substantial variation in the composition of the tests, corresponding information is provided in the 
tables in Appendix 12.1 for all mathematics grades/courses. 


12.2 Establishing the Reporting Scales 


Reporting scales designate student performance into one of five performance levels® with Level 1 indicating the 
lowest level of performance and Level 5 indicating the highest level of performance. Threshold or cut scores 
associated with performance levels were initially expressed as raw scores on the performance level setting (PLS) 
forms approved by the Governing Board. A scale score task force was assembled, which made recommendations 
about how threshold levels would be represented on the reporting scale. 


12.2.1 Summative Score Scale and Performance Levels 


There are 201 defined summative scale score points for both ELA/L and mathematics, ranging from 650 to 850. The 
lowest obtainable scale score is 650 and the highest obtainable scale score is 850. The threshold for summative 
performance levels on the scale score metric recommended by the scale score task force is Level 2 and Level 4. The 
cuts are the anchors for establishing the linear transformation between the theta scale and the reported scale score. 
A scale score of 700 is associated with minimum Level 2 performance, and a scale score of 750 is associated with 
minimum Level 4 performance. Not all possible scale scores may be realized in a scoring table. 


For spring 2015, scale scores were defined for each test as a linear transformation of the theta (@2015 ) scale. The 
theta values associated with the Level 2 and Level 4 performance levels were identified using the test characteristic 
curve associated with the performance level setting form. With Levels 2 and 4 scale scores fixed at 700 and 750, 
respectively, the relationship between theta (@20:5 ) and scale scores (ScaleScoreé215 ) Was established as 


5 Table A.12.10 in Appendix 12.1 is identical to Table 12.3. 
§ Section 8 provides an overview of the performance level setting process, and detailed information can be found in the 
Performance Level Setting Technical Report. 
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ScaleScore 915 = Anois x Boo15 + Boois (12-1) 


where A015 is the slope and Bo 5 is the intercept. The slope and intercept were established as 


7 750-700 
2015 = (12-2) 
a9 lSreveia: Ay 1Szevel2 


and 
Boois = 750 — Anois x 2015 reveis (12-3) 


As indicated by these formulas, the slope and intercept for the summative scale scores were based on the theta 
scale, and by default the IRT parameter scale, established in 2015. Since the spring 2016 IRT parameter scale is the 
base scale for the IRT parameters, the scaling constants _A29,5 and B29,5 were updated in order to continue 


reporting performance levels, summative scale scores, claim scores, and subclaim performance levels on the same 
scale as 2015. Maintaining the 2015 scale allows for prior year scores to be compared to current and future scores, 
and it maintains the performance levels cut scores. 


New scaling constants for the summative scale score were needed for the linear transformation of the theta scale 


A016 to the 2015 reporting scale (ScaleScorey1s): 
ScaleScorer\s = SA2016 X A2016 + SBr016 (12-4) 


The slope (slopezo1s_ 102016) and intercept (interceptro1s to 2016) generated during the year-to-year linking 
defined the linear relationship between the 2015 theta scale (@2015 ) and the 2016 theta scale (@2016 ) . These 
values were included in the scale score formula, and the formulas were used to solve for the slope (S'A2016 ) and 


(SB2016) intercept for 2016. 


The slope (42016) was updated using the following formula: 


Anois 
SAo916 =———_ (12-5) 
Sloper015. 102016 


where Ag015 is the current scale score multiplicative constant, Slopeér15_to_ 2016'S the multiplicative coefficient from 


the year-to-year linking, and SAy916 is the scale score slope constant for 2016 and beyond. 
The intercept (B2o16 ) was updated using the following formula: 
SBooi6 = Boos — Aros X interceptzo15_%0_2016 (12-6) 


where 22015 is the current scale score additive constant, A016 is the updated scale score slope, and (SBo01¢ ) is the 


scale score intercept constant for 2016 and beyond. 
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In addition, new scaling constants for the reading and writing claim scales were needed. The same formulas were 
applied by replacing the slope (42015 ) and intercept (B21; ) with the reading claim slope and intercept and the 


writing claim slope and intercept. 


A and B values resulting from these calculations as well as the theta values associated with the threshold 
performance levels are included in Appendix 12.2. Also, the 2015-2016 technical report includes raw to scale score 
conversion tables for the performance level setting forms. 


12.2.2 ELA/L Reading and Writing Claim Scale 


There are 81 defined scale score points possible for Reading, ranging from 10 to 90. The threshold Reading and 
Writing performance levels on the scale score metric recommended by the scale score task force are Level 2 and 
Level 4. A scale score of 30 is associated with minimum Level 2 performance, and a scale score of 50 is associated 
with minimum Level 4 performance. There are 51 defined scale score points possible for Writing, ranging from 10 to 
60. A scale score of 25 is associated with minimum Level 2 performance, and a scale score of 35 is associated with 
minimum Level 4 performance. Not all possible scale scores may be realized in a scoring table. 


As with the summative scale scores, scale scores for Reading and Writing were defined for each test as a linear 
transformation of the IRT theta (8) scale. The same IRT theta scale was used for Reading and Writing as was used for 
the ELA/L summative scores. The theta values associated with the Level 2 and Level 4 performance levels were 
identified using the test characteristic curve associated with the performance level setting form. As with the 
summative scores, the relationship between theta and scale scores was established with Level 2 and Level 4 theta 
scores and the corresponding predefined scale scores. The formulas used for this are provided in Table 12.4. 


Table 12.4 Calculating Scaling Constants for Reading and Writing Claim Scores 


Reading Writing 
Scale = Arx0+ Br Scale = Ay x0+ By 
Ae 50-30 ae 35-25 
Orevel4 —Orevei2 Orevei4 = Orevei2 
Br =50-AxOreveia By =35-—AxOreveia 


A and B values resulting from these calculations are included in Appendix 12.2. 


12.2.3 Subclaims Scale 


The Level 4 cut is defined as Meets or Exceeds Expectations because high school students at Level 4 or above are 
likely to have the skills and knowledge to meet the definition of career and college readiness. The Level 3 cut is 
defined as Nearly Meets Expectations. Subclaim outcomes center on the Level 3 and Level 4 performance levels and 
are reported at three levels: 


e Below Expectations; 
e Nearly Meets Expectations; or 


e Meets or Exceeds Expectations. 
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The subclaim performance levels are designated through the IRT theta (A) scale for the items associated with a 
particular subclaim. The theta values and corresponding raw scores associated with the Level 3 and Level 4 
performance levels were identified using the test characteristic curve. Students earning a raw subclaim score equal 
to or greater than the Level 4 threshold were designated as Meets or Exceeds Expectations. Students not earning a 
raw subclaim score equal to or greater than the Level 3 threshold were designated as Below Expectations. Other 
students whose raw subclaim score fell between the Level 3 and 4 thresholds were designated as Nearly Meets 
Expectations. 


12.3 Creating Conversion Tables 


A conversion table relates the number of points earned by a student on the ELA/L summative score, the 
mathematics summative score, the Reading claim score, or the Writing claim score to the corresponding scale score 
for the test form administered to that student. An IRT inverse test characteristic curve (TCC) approach is used to 


develop the relationship between point scores and theta, 0. , (IRT ability estimates). In carrying out the calculations, 


estimates of item parameters and thetas are substituted for parameters in the formulas in each step. 


Step 1: Calculate the expected item score (i.e., estimated item true score) for every theta in the selected range 
(between -15 and +15, in 0.0001 increments) based on the generalized partial credit model for both dichotomous 
and polytomous items: 


m=0 (12-7) 


exp Da: (0;-b +d) 


Pim (0;) = 7 es) 
> exp > Da: (9; —b +d) 
v=0 k=0 


where Qj (6; —b;+ div) =0; 5; (0;) is the expected item score for item J on theta, 0, , Pim (0;) is the 
probability of a student, j, with 0; getting score /M onitem i; m,; is the number of score categories of item /; 


with possible item scores as consecutive integers from 0 to mm; — 1; D isthe IRT scale constant (1.7); a; isa 


slope parameter; 5; is a location parameter reflecting overall item difficulty; d, is a location parameter 
incrementing the overall item difficulty to reflect the difficulty of earning score category A; » is the number of 


score categories. 


Step 2: Calculate the expected (weighted) test score for every theta in the selected range: 


I 
T; => wis: (9;) (12-9) 
i=] 
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where 7’, is the expected (weighted) test score on theta, 0,1 Wi is the item weight for item i (e.g., with w; = 2, a 


dichotomous item is scored as 0 or 2, and a three-category item is scored as 0, 2, or 4); 1 is the total number of items 
in atest form. 


Step 3: Calculate the estimated conditional standard error of measurement (CSEM) for each theta in the selected 
range: 


(12-10) 


L;(0;) = (Da;)?[8i2(8;) - 57 (8;))] (12-11) 


Mi-1 
$i2(O;) = » m2 Din (0;) (12-12) 


m=0 
where L,(6;) is the estimated item information function for item on theta, O;. 
Step 4: Match every raw score with a theta. @; is the theta for a raw score 77, , if T; — 7, is minimum across all 7;; . 


Step 5: Calculate the reported scale score. Using the AandB scaling constants in Appendix 12. 2, convert each 
theta value to a scale score and each theta CSEM to a scale score CSEM: 


ScaleScore = Ax0+B (12-13) 
CSEM =CSEM~¢4 x A (12-14) 


The scale scores are rounded to the nearest whole number, and CSEMs are rounded to the tenths place. 
Furthermore, the scale scores are truncated with the lowest obtainable scale score (LOSS) of 650 and highest 
obtainable scale score (HOSS) of 850. 


Figure 12.1 contains TCCs, estimated CSEM curves, and estimated information (INF) curves for ELA/L grade 3.’ The 
curves in each figure are for the two core online forms (O1 and O2), one core paper form (P1), and two or three 
accommodated forms A(O). The curves are reported on the theta scale. Vertical dotted lines indicate the 
performance level cuts on the theta scale. For ELA/L grade 3, all forms had very similar TCCs. CSEM and INF curves 
were also similar. Appendix 12.3 contains the pre-equated TCC, CSEM, and INF curves for all ELA/L grades and all 
mathematics grades/courses are based on IRT parameters from a prior operational or field-test administration. 


7 Grade 3 TCC, CSEM, and INF curves are also included in Appendix Figure A.12.1. 
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ELA/L Grade 3 
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Figure 12.1 Test Characteristic Curves, Conditional Standard Error of Measurement Curves, and Information 
Curves for ELA/L Grade 3 


12.4 Score Distributions 


12.4.1 Score Distributions for ELA/L 


Figures 12.2 through 12.4 graphically represent the distributions of scale scores for grades 3 through 11 ELA/L 
summative, Reading, and Writing, respectively. The vertical axis of each graph, labeled “Density,” represents the 
proportion of students earning the scale score point indicated along the horizontal axis. For the summative 
distributions, the y-axis ranges from 0 to .02 and the x-axis from 650 to 850. For the Reading distributions, the y-axis 
ranges from 0 to 0.05 and the x-axis from 10 to 90. For the Writing distributions, the y-axis ranges from 0 to .10 and 
the x-axis from 10 to 60. 


The distributions of the ELA/L summative scale scores were fairly symmetrical and centered around the Level 4 cut 
score (750), with the exception of grade 3, which was slightly less symmetrical, and grades 9 and 11, which were 
centered somewhat lower. 


Reading scale scores tended to be centered around or slightly below the Level 4 cut score of 50 and were slightly 
more irregular than the summative scale scores. Distributions tended to be fairly symmetric. Grade 11 was centered 
between 30 and 40. 


Writing scale score distributions were noticeably less smooth than Reading or ELA/L summative distributions due to 
peaks related to the weighting of the Written Expression portion of the PCR tasks and a noticeable proportion of 
students at the lowest obtainable scale score (LOSS). Due to the weighting of the Written Expression trait, multiple 
Writing scale score values are not likely to be obtained resulting in multiple peaks across the range of the Writing 
scale score. A noticeable proportion of students earned the LOSS of ten in Writing across all ELA/L grades. Students 
with zero raw score points on the written portion of the assessment are automatically assigned the LOSS value of a 
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scale. Writing items are embedded exclusively in PCR tasks, which tended to be difficult. The Written Expression trait 
also tended to be the most difficult of the PCR traits. 


Across the ELA/L grades, zero students obtained scale scores in the range of eleven to seventeen.® As noted in 
Section 12.2.2, the scale score task force selected ten as the LOSS. This value was selected to be consistent with the 
Reading LOSS and reduce truncation at the lower ends of the scale. However, the scale is defined by the theta values 
associated with the Level 2 and Level 4 performance levels. All other scale score values are identified through a 
theta-to-scale score linear transformation applying the scaling constants (Table 12.4). For Writing, the lowest theta 
estimate associated with raw scores ranging from one to two are linearly transformed to scale score values in the 
range of seventeen to nineteen. Whereas, the Reading lowest theta estimates associated with raw scores ranging 
from one to two are linearly transformed to scale score values in the range of ten to eleven. The gap in the 
proportion of students at the scale scores between the LOSS value of ten and the scale score values around 
seventeen to nineteen is an artifact of scale score task force selecting the LOSS value of ten. 


8 Due to smoothing of the kernel density function, in some figures, particularly those with small sample sizes, the line 
representing the distribution may appear to remain above zero near the region. 
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Figure 12.2 Distributions of ELA/L Scale Scores: Grades 3-11 


New Meridian 


February 28, 2020 


Summative Scale Score 


Page 100 


Grade 7 
0.020 4 
0.015 4 
2 
= 0.0104 
Q 
0.005 + 
0.000 
650 700 750 800 850 
Summative Scale Score 
Grade 9 
0.020 4 
0.015 4 
2 
= 0.010+ 
QO 
0.005 + 
0.000 
650 700 750 800 850 
Summative Scale Score 
Grade 11 
0.020 4 
0.015 5 
> 
= 0.010+ 
Q 
0.005 + 


OS a a an | 


650 700 750 800 850 


Summative Scale Score 


Density 


Density 


2019 Technical Report 


Grade 8 


0.020 4 


0.015 5 


0.010 5 


0.005 + 


0.000 
650 700 750 800 850 


Summative Scale Score 


Grade 10 


0.020 4 


0.015 5 


0.010 5 


0.005 + 


.00>>—S— OO Oo Tt 
650 700 750 800 850 


Summative Scale Score 


Figure 12.2 (continued) Distributions of ELA/L Scale Scores: Grades 3-11 
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Figure 12.3 Distributions of Reading Scale Scores: Grades 3-11 
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Figure 12.3 (continued) Distributions of Reading Scale Scores: Grades 3-11 
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Figure 12.4 Distributions of Writing Scale Scores: Grades 3-11 
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Figure 12.4 (continued) Distributions of Writing Scale Scores: Grades 3-11 
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12.4.2 Scale Score Cumulative Frequencies for ELA/L 


The cumulative frequency distribution for the summative scale score is presented in Appendix 12.4 for ELA/L 
assessments. 


12.4.3 Summary Scale Score Statistics for ELA/L Groups 


Subgroup statistics for ELA/L full summative, Reading, and Writing scale scores are presented in Tables 12.5 and 
12.6° for ELA/L grades 3 and 9, respectively. The results for all ELA/L grades are provided in Appendix 12.5. Grade 3 
ELA/L subgroup statistics are presented in Table 12.5.1° Mean scores were higher for female students relative to 
male students. Mean scores were highest for Asian students and were lowest for Hispanic/Latino and black/African 
American students. Economically disadvantaged students performed less well than students who are not 
economically disadvantaged. English learners (EL) performed less well than non-EL students. Students with 
disabilities performed less well than students without disabilities. Patterns of mean scale scores were similar in 
grades 4 through 8, although the ordering of ethnicity subgroups varied slightly; corresponding tables for all grades 
are presented in Appendix 12.5. 


Grade 9 subgroup statistics for ELA/L, Reading, and Writing scale scores are presented in Table 12.6.1 Mean scores 
were very similar to what was observed for grades 3 through 8. Mean scores were higher for female students than 
for male students. Mean scores were highest for White students and were lowest for Hispanic/Latino and 
black/African American students. Economically disadvantaged students performed less well than students who are 
not economically disadvantaged. Students with disabilities performed less well than students without disabilities. 
Similar patterns are observed in other high school assessments, with some small variations in the ordering of the 
ethnicity subgroups. Corresponding tables for grades 10 and 11 are presented in Appendix 12.5. 


° Due to omitted demographic values, subgroup sample sizes in these tables may not sum to total sample size. 
Table A.12.47 in Appendix 12.5 is identical to Table 12.5. 
1 Table A.12.53 in Appendix 12.5 is identical to Table 12.6. 
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Group Type Group N Mean SD Min Max 
Full 
Summative 72,442 737.23 42.45 650 850 
Score 
Female 35,570 742.86 42.63 650 850 
Senge! Male 36,872 731.80 4157 650 850 
American Indian/Alaska Native 196 733.54 38.23 650 839 
Asian 4,357 764.22 40.31 650 850 
Black/African American 26,106 723.36 39.21 650 850 
a Hispanic/Latino 13,332 723.34 39.13 650 850 
Ethnicity - — — 
tee oracle 95 736.84 42.64 651 832 
Two or more races 3,420 745.92 41.31 650 850 
White 24,754 753.26 39.64 650 850 
Economic Not Economically Disadvantaged 37,541 752.70 40.29 650 850 
Status Economically Disadvantaged 34,883 720.59 38.21 650 850 
English Non-English Learner 56,834 742.34 42.07 650 850 
Learner Status English Learner 10,128 711.65 34.41 650 850 
Disabilities Students without Disabilities 61,539 742.60 40.79 650 850 
Students with Disabilities 10,903 706.89 38.76 650 850 
Reading 
Summative 72,442 45.90 17.37 10 90 
Score 
Female 35,570 47.40 17.15 10 90 
Gender 
Male 36,872 44.45 17.45 10 90 
American Indian/Alaska Native 196 44.89 16.03 10 88 
Asian 4,357 56.19 16.51 10 90 
Black/African American 26,106 40.10 15.70 10 90 
- Hispanic/Latino 13,332 39.75 15.77 10 90 
Ethnicity - = — 
Rs or Pacific 95 5.65 16.60 12 79 
Two or more races 3,420 49.59 16.79 10 90 
White 24,754 52.94 16.43 10 90 
Economic Not Economically Disadvantaged 37,541 52.27 16.64 10 90 
Status Economically Disadvantaged 34,883 39.05 15.42 10 90 
English Non-English Learner 56,834 48.07 17.21 10 90 
Learner Status English Learner 10,128 34.90 13.59 10 90 
be een Students without Disabilities 61,539 47.95 16.70 10 90 
Disa’ “eairiehtewith Oleabilltie® 10,903 34.29 16.44 10 90 
Writing 
Summative 72,442 29.24 12.83 10 60 
Score 
Female 35,570 31.45 12.52 10 60 
Gender 
Male 36,872 27.10 12.75 10 60 
American Indian/Alaska Native 196 27.96 12.30 10 54 
Ethnicity Asian 4,357 36.68 11.16 10 60 
Black/African American 26,106 25.69 12.57 10 60 
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Group Type Group 
Hispanic/Latino 
Native Hawaiian or Pacific 
Islander 
Two or more races 


White 
Economic Not Economically Disadvantaged 
Status Economically Disadvantaged 
English Non-English Learner 


Learner Status English Learner 
Students without Disabilities 


Disabilities 
Students with Disabilities 


N 
13,332 


95 


3,420 
24,754 
37,541 
34,883 
56,834 
10,128 
61,539 
10,903 


Mean 
26.33 


27.89 


31.20 
32.95 
33.25 
24.93 
30.42 
23.46 
30.79 
20.45 


SD 
12.38 


14.16 


12.54 
11.98 
11.90 
12.37 
12.70 
11.75 
12.36 
11.84 


Min 
10 
10 


10 
10 
10 
10 
10 
10 
10 
10 


Max 
60 


60 


60 
60 
60 
60 
60 
60 
60 
60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt of free or 


reduced-price lunch (FRL). 
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Group Type Group N Mean SD Min Max 
Full 
Summative 3,388 732.86 40.47 650 850 
Score 
Female 1,735 739.65 39.55 650 850 
Gender 
Male 1,653 725.73 40.20 650 850 
American Indian/Alaska Native n/r n/r n/r n/r n/r 
Asian 48 779.06 31.66 701 840 
Black/African American 2,367 725.61 35.12 650 827 
Ethnicity Hispanic/Latino 579 725.67 40.03 650 843 
Native Hawaiian or Pacific Islander n/r n/r n/r n/r n/r 
Two or More Races n/r n/r n/r n/r n/r 
White 330 784.25 34.27 650 850 
Economic Not Economically Disadvantaged 793 769.07 37.22 650 850 
Status Economically Disadvantaged 2,594 721.80 34.52 650 840 
English Learner Non-English Learner n/r n/r n/r n/r n/r 
Status English Learner 284 705.59 34.49 650 830 
Students without Disabilities 2,635 739.60 39.35 650 850 
Disabilities 
Students with Disabilities 753 709.27 35.16 650 841 
Reading 
Summative 3,388 43.65 16.67 10 90 
Score 
Female 1,735 45.68 16.50 10 90 
Gender 
Male 1,653 41.52 16.60 10 90 
American Indian/Alaska Native n/r n/r n/r n/r n/r 
Asian 48 62.06 12.66 32 90 
Black/African American 2,367 40.71 14.54 10 84 
Ethnicity Hispanic/Latino 579 40.31 16.13 10 90 
Native Hawaiian or Pacific Islander n/r n/r n/r n/r n/r 
Two or More Races n/r n/r n/r n/r n/r 
White 330 65.22 13.73 10 90 
Economic Not Economically Disadvantaged 793 58.74 15.29 10 90 
Status Economically Disadvantaged 2,594 39.04 14.17 10 89 
English Learner Non-English Learner n/r n/r n/r n/r n/r 
Status English Learner 284 32.22 13.42 10 79 
Students without Disabilities 2,635 46.25 16.29 10 90 
Disabilities 
Students with Disabilities 753 34.55 14.70 10 86 
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Table 12.6 Subgroup Performance for ELA/L: Grade 9 


Group Type Group N Mean SD Min Max 
Writing 
Summative 3,388 28.15 12.51 10 60 
Score 
Female 1,735 30.91 11.58 10 60 
Gender 
Male 1,653 25.26 12.79 10 54 
American Indian/Alaska Native n/r n/r n/r n/r n/r 
Asian 48 40.25 7.97 10 54 
Black/African American 2,367 26.36 11.75 10 54 
Ethnicity Hispanic/Latino 579 26.51 12.67 10 54 
Native Hawaiian or Pacific Islander n/r n/r n/r n/r n/r 
Two or More Races n/r n/r n/r n/r n/r 
White 330 40.48 9.61 10 60 
Economic Not Economically Disadvantaged 793 37.14 10.60 10 60 
Status Economically Disadvantaged 2,594 25.41 11.73 10 54 
English Learner Non-English Learner n/r n/r n/r n/r n/r 
Status English Learner 284 20.80 11.83 10 53 
Students without Disabilities 2,635 30.30 11.86 10 60 
Disabilities 
Students with Disabilities 753 20.63 11.80 10 53 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt of free or 
reduced-price lunch (FRL). n/r = not reported due to n<20. 


12.4.4 Score Distributions for Mathematics 


Figure 12.5 graphically represents the distributions of scale scores for grades 3 through 8 mathematics. The y-axis for 
these distributions ranges from 0 to 0.02 and the x-axis from 650 to 850. Scale score distributions generally peaked 
between approximately 700 and the Level 4 performance level cut of 750. Figure 12.6 graphically represents the 
distributions of scale scores for Algebra |, Geometry, Algebra Il, and Integrated Mathematics | and II. Scale score 
distributions generally peaked between approximately 700 and the 750 Level 4 performance level cut score for 
Algebra | and for Integrated Mathematics | and Il. For Geometry and Algebra II, distributions peaked between 750 
and 800. Algebra | had a positively skewed distribution. 


12.4.5 Scale Score Cumulative Frequencies for Mathematics 


The cumulative frequency distribution for the summative scale score is presented in Appendix 12.4 for mathematics 
assessments. 
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12.4.6 Summary Scale Score Statistics for Mathematics Groups 


Subgroup statistics for mathematics scale scores are presented in Tables 12.7—-12.9*” for grade 3, Algebra I, and 
Integrated Mathematics I, respectively. Grade 3 subgroup statistics are presented in Table 12.7.1? Mean scores were 
similar for male and female students. Mean scores were highest for Asian students and were lowest for 
black/African American students. Economically disadvantaged students performed less well than students who are 
not economically disadvantaged. English learners (EL) performed less well than non-EL students. Students with 
disabilities performed less well than students without disabilities. Students using the Spanish Language form tended 
to have lower mean scores. Generally similar patterns were observed in other grades, with some slight variations in 


the orderings of the ethnicity subgroups. Corresponding tables for all grades/courses are presented in Appendix 
12.5. 


22 Due to omitted demographic values, subgroup sample sizes in these tables may not sum to total sample size. 
13 Table A.12.56 in Appendix 12.5 is identical to Table 12.7. 
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Figure 12.5 Distributions of Mathematics Scale Scores: Grades 3-8 
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Group Type Group N Mean SD Min Max 

Full Summative Score 79,070 741.90 38.02 650 850 

d Female 38,807 741.95 37.13 650 850 

pene Male 40,263 741.86 38.86 650 850 

American Indian/Alaska Native 219 735.93 35.23 650 822 

Asian 4,811 770.17 35.25 650 850 

Black/African American 26,772 727.66 35.46 650 850 

. Hispanic/Latino 15,070 730.89 34.83 650 850 
Ethnicity : -- = 

eae orraelie 185 747.44 34.08 650 838 

Two or more races 4,174 748.34 36.78 650 850 

White 27,554 755.69 34.80 650 850 

Nobecenemcaly 43,882 754.51 35.79 650 850 

Economic Status* Disadvantaged 

Economically Disadvantaged 35,169 726.18 34.73 650 850 

EAwlish Learner States Non-English Learner 62,145 745.33 37.96 650 850 

English Learner 11,464 723.89 33.09 650 850 

‘sabiliti Students without Disabilities 67,214 746.05 36.59 650 850 

pease Students with Disabilities 11,856 718.42 37.47 650 850 

Language Form Spanish 139 709.93 40.59 650 807 


*Economic status was based on participation in National School Lunch Program (NSLP): receipt of free or 


reduced-price lunch (FRL). 


Algebra | scale score statistics are presented in Table 12.8.** Mean scores were slightly higher for female students 


relative to male students. Mean scores were highest for Asian students and were lowest for black/African American 


students. Economically disadvantaged students performed less well than students who are not economically 


disadvantaged. English learners (EL) performed less well than non-EL students. Students with disabilities performed 


less well than students without disabilities. Students using the Spanish Language form tended to have lower mean 


scores. Similar patterns were observed in the other high school tests with some of the previously mentioned 


exceptions in the ordering of the ethnicities applying to these tests as well. Corresponding tables are presented in 


Appendix 12.5. 


4 Table A.12.62 in Appendix 12.5 is identical to Table 12.8. 
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Group Type Group N Mean SD Min Max 

Full Summative Score 84,749 735.17 34.55 650 850 

d Female 41,214 736.26 33.63 650 850 

edo Male 43,535 734.14 +3536 = 650.~—S—«850 

cu india Mlaske 200 731.53 2935 650 835 

Asian 5,384 764.06 37.11 650 850 

Black/African American 31,315 720.80 26.90 650 850 

Ethnicity Hispanic/Latino 16,225 721.86 28.59 650 850 

aia BnPeciile 176 743.73 2806 678 820 

Two or more races 3,650 743.40 33.84 650 850 

White 27,641 752.43 33.38 650 850 

pore onemealy 50,459 745.57 35.41 650 850 
Economic Status* Disadvantaged 

Economically Disadvantaged 34,201 719.90 26.68 650 850 

Endlich licarner Statue Non-English Learner 74,256 738.14 34.30 650 850 

English Learner 7,253 708.67 23.32 650 850 

Disabilities Students without Disabilities 69,867 738.95 34.39 650 850 

Students with Disabilities 14,882 717.41 29.35 650 850 

Language Form Spanish 705 704.80 22.37 650 805 


*Economic status was based on participation in National School Lunch Program (NSLP): receipt of free or 
reduced-price lunch (FRL). 


Integrated Mathematics | scale score statistics are presented in Table 12.9. *° Tables for Integrated Mathematics II 


tests can be found in Appendix 12.5. Sample sizes for both tests are very small, a number of subgroups did not have 


sufficient students for reporting purposes. Caution should be used in interpretations. 


15 Table A.12.55 in Appendix 12.5 is identical to Table 12.9. 
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Table 12.9 Subgroup Performance for Mathematics Scale Scores: Integrated Mathematics | 


Group Type Group N Mean SD Min Max 
Full Summative Score 144 729.53 27.16 662 816 
Gander Female 84 730.30 26.85 678 816 
Male 60 728.45 27.78 662 809 
American Indian/Alaska Native n/r n/r n/r n/r n/r 
Asian n/r n/r n/r n/r n/r 
Black/African American 72 725.64 25.31 662 785 
at Hispanic/Latino 54 729.02 28.08 678 816 
Ethnicity : = ve 
Native Hawaiian or Pacific ait ale ai air ae 
Islander 
Two or More Races n/r n/r n/r n/r n/r 
White n/r n/r n/r n/r n/r 
nor Economieay 46 737.41 ~—«28.08 662 809 
Economic Status* Disadvantaged 
Economically Disadvantaged 98 725.83 26.05 678 816 
. Non-English Learner n/r n/r n/r n/r n/r 
Engle ealihatinl English Learner n/r n/r n/r n/r n/r 
Disabilities Students without Disabilities 110 731.85 24.65 678 816 
Students with Disabilities 34 722.03 33.38 662 809 
Language Form Spanish n/r n/r n/r n/r n/r 


*Economic status was based on participation in National School Lunch Program (NSLP): receipt of free or 
reduced-price lunch (FRL). n/r = not reported due to n<20. 


12.5 Interpreting Claim Scores and Subclaim Scores 


12.5.1 Interpreting Claim Scores 


ELA/L assessments provide separate claim scale scores for both Reading and Writing. The claim scale scores and the 
summative scale score are on different scales; therefore, the sum of the scale scores for each claim will not equal 
the summative scale score. Reading scale scores range from 10 to 90 and Writing scale scores range from 10 to 60. 


The claim scores can be interpreted by comparing a student’s claim scale score to the average performance for the 
school, district, and state. The Individual Student Report (ISR) provides the student scale score results and the 
average scale score results for the school, district, and state. 


12.5.2 Interpreting Subclaim Scores 


Within each reporting category are specific skill sets (subclaims) students demonstrate on the summative 
assessments. Subclaim categories are not reported using scale scores or performance levels. Subclaim performance 
for the assessments is reported using graphical representations that indicate how the student performed relative to 
the Level 3 and Level 4 performance levels for the content area. 


Subclaim indicators represent how well students performed in a subclaim category relative to Level 3 and Level 4 
thresholds for the items associated with the subclaim category. To determine a student’s subclaim performance, the 
Level 3 and Level 4 thresholds corresponding to the IRT based performance for the items for a given subclaim 


New Meridian February 28, 2020 Page 116 


2019 Technical Report 


determined the reference points for Approached Expectations and Did Not Yet Meet Expectations or Partially Met 
Expectations, respectively. 


Student performance for each subclaim is marked with a subclaim performance indicator. 


An ‘up’ arrow for the specified subclaim indicates that the student Met or Exceeded Expectations, meaning 
that the student’s subclaim performance reflects a level of proficiency consistent with Performance Level 4 
or 5. Students in this subclaim category are likely academically well prepared to engage successfully in 
further studies in the subclaim content area and may need instructional enrichment. 


A ‘bidirectional’ arrow for the specified subclaim indicates that the student Approached Expectations, 
meaning that the student's subclaim performance reflects a level of proficiency consistent with 
Performance Level 3. Students in this subclaim category likely need academic support to engage 
successfully in further studies in the subclaim content area. 


A ‘down’ arrow for the specified subclaim indicates that the student Did Not Yet Meet or Partially Met 
Expectations meaning that the student’s subclaim performance reflects a level of proficiency consistent 
with Performance Level 1 or 2. Students in this subclaim category are likely not academically well prepared 
to engage successfully in further studies in the subclaim content area. Such students likely need 
instructional interventions to increase achievement in the subclaim content area. 
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Section 13: Reliability 


13.1 Overview 


Reliability focuses on the extent to which differences in test scores reflect true differences in the knowledge, ability, 
or skill being tested rather than fluctuations due to chance. Thus, reliability measures the consistency of the scores 
across conditions that can be assumed to differ at random, especially which form of the test the student is 
administered and which raters are assigned to score responses to constructed-response questions. In statistical 
terms, the variance in the distributions of test scores, essentially the differences among individuals, is partly due to 
real differences in the knowledge, skill, or ability being tested (true variance) and partly due to random errors in the 
measurement process (error variance). Reliability is an estimate of the proportion of the total variance that is true 
variance. 


There are several different ways of estimating reliability. The type of raw score reliability estimate reported here is 
an internal-consistency measure, which is derived from analysis of the consistency of the performance of individuals 
across items within a test. It is used because it serves as a good estimate of alternate forms reliability, but it does not 
take into account form-to-form variation due to lack of test form parallelism, nor is it responsive to day-to-day 
variation due to, for example, the student’s state of health or the testing environment. The scale score reliability 
results use a modified measure of internal consistency that account for the conversions between raw scores and 
scale scores. 


Reliability coefficients range from 0 to 1. The higher the reliability coefficient for a set of scores, the more likely 
students would be to obtain very similar scores upon repeated testing occasions, if the students do not change in 
their level of the knowledge or skills measured by the test. The reliability estimates in the tables to follow attempt to 
answer the question, “How consistent would the scores of these students be over replications of the entire testing 
process?” 


Reliability of classification estimates the proportion of students who are accurately classified into proficiency levels. 
There are two kinds of classification reliability statistics: decision accuracy and decision consistency. Decision 
accuracy is the agreement between the classifications actually made and the classifications that would be made if 
the test scores were perfectly reliable. Decision consistency is the agreement between the classifications that would 
be made on two independent forms of the test. 


Another index is inter-rater reliability for the human-scored constructed-response items, which measures the 
agreement between individual raters (scorers). The inter-rater reliability coefficient answers the question, “How 
consistent is the scoring such that a set of similarly trained raters would produce similar scores to those obtained?” 


Standard error of measurement (SEM) quantifies the amount of error in the test scores. SEM is the extent by which 
students’ scores tend to differ from the scores they would receive if the test were perfectly reliable. As the SEM 
increases, the variability of students’ observed scores is likely to increase across repeated testing. Observed scores 
with large SEMs pose a challenge to the valid interpretation of a single test score. 


Reliability and SEM estimates were calculated at the full assessment level and at the claim and subclaim levels. In 
addition, conditional SEMs were calculated and reported in Appendix 12.3. 
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13.2 Reliability and SEM Estimation 


13.2.1 Raw Score Reliability Estimation 


Coefficient alpha (Cronbach, 1951), which measures internal consistency reliability, is the most commonly used 
measure of reliability. Coefficient alpha is estimated by substituting sample estimates for the parameters in the 
formula below: 


n 
2 
n 1 _ ame Oi 


n-1 fox 


(13-1) 


2 : 2 
where Il is the number of items, 0; is the variance of scores on the J thitem, and Oy is the variance of the total 


score (sum of scores on the individual items). Other things being equal, the more items a test includes, the higher 
the internal consistency reliability. 


Since the test forms have mixed item types (dichotomous and polytomous items), it is more appropriate to report 
stratified alpha (Feldt & Brennan, 1989). Stratified alpha is a weighted average of coefficient alphas for item sets 
with different maximum score points or “strata.” Stratified alpha is a reliability estimate computed by dividing the 
test into parts (strata), computing alpha separately for each part, and using the results to estimate a reliability 
coefficient for the total score. Stratified alpha is used here because different parts of the test consist of different 
item types and may measure different skills. The formula for the stratified alpha is: 


>, 23 (1-an) 


OF 


Pstrata = 1 > (13-2) 
2 2 
Where Oy is the variance for part h of the test, Oy is the variance of the total scores, and @;, is coefficient 


alpha for part / of the test. Estimates of stratified alpha are computed by substituting sample estimates for the 
parameters in the formula. The average stratified alpha is a weighted average of the stratified alphas across the test 
forms. 


The formula for the standard error of measurement is: 
Or =Oxyl- Px (13-3) 


Where O y is the standard deviation of the test raw score and P,,: is the reliability estimated by substitution of 


appropriate statistics for the parameters in equation 13-1 or 13-2. 


In this section, reliability estimates are reported for overall summative scores, claim scores, and subclaim scores. 
Estimates are also reported for subgroups for summative scores. Cronbach’s alpha and stratified alpha coefficients 
are influenced by test length, test characteristics, and sample characteristics (Lord & Novick, 1968; Tavakol & 
Dennick, 2011; Cortina, 1993). As test length decreases and samples become smaller and more homogeneous, lower 
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estimates of alpha are obtained (Tavakol & Dennick, 2011; Pike & Hudson, 1998). A decrease in the number of items 
may result in a decrease in stratified alpha estimates. The decrease in sample size and the homogeneity of the 
samples is likely to result in lower stratified alpha estimates. A smaller more homogenous sample will likely result in 
lower stratified alpha estimates. Moderate to acceptable ranges of reliability tend to exceed .5 (Cortina, 1993; 
Schmitt, 1996). Estimates lower than .5 may indicate a lack of internal consistency. Additional analyses investigate 
whether lower estimates of alpha are due to restriction in range of the sample. In these cases, the alpha estimates 
are not appropriate measures of internal consistency. As a result, sample-free reliability estimates are also provided 
such as scale score reliability (Kolen et al., 1996). 


13.2.2 Scale Score Reliability Estimation 


Like the stratified alpha coefficients, scale score reliability coefficients range from O to 1. The higher the reliability 
coefficient for a set of scores, the more likely individuals would be to obtain similar scores upon repeated testing 
occasions, if the students do not change in their level of the knowledge or skills measured by the test. Because the 
scale scores are computed from a total score and do not have an item-level component, a stratified alpha coefficient 
cannot be computed for scale scores. Instead, Kolen et al.’s (1996) method for scale score reliability was used. 


The general formula for a reliability coefficient, 


(13-4) 


involves the error variance, 02 (E) and the total score variance, 02 (Xx) . Using Kolen et al.’s (1996) method, 


conditional raw score distributions are estimated using Lord and Wingersky’s (1984) recursion formula. The 
conditional raw score distributions are transformed into conditional scale score distributions. Denote X as the raw 


sum score ranging from 0 to XxX , andS as a resulting scale score after transformation. The conditional distribution of 
scale scores is written as P(X =X | 0) . The mean and variance, 02 [s(X)| , of this distribution can be computed 


using these scores and their associated probabilities. 


The average error variance of the scale scores is computed as 


0? (Errotica) = [ 0? (s(X)|O)g(0)d0 (13-5) 
0 


where (8) is the ability distribution. The square root of the error variance is the conditional standard error of 


measurement of the scale scores. 


Just as the reliability of raw scores is one minus the ratio of error variance to total variance, the reliability of scale 
scores is one minus the ratio of the average variance of measurement error for scale scores to the total variance of 
scale scores, 
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ae | o2 (Errorcate ) 
Pscale — = [s(x)] 


(13-6) 


The Windows program POLYCSEM (Kolen, 2004) was used to estimate scale score error variance and reliability. 


13.3 Reliability Results for Total Group 


13.3.1 Raw Score Reliability Results 


Tables 13.1 and 13.2 summarize test reliability estimates for the total testing group for ELA/L and mathematics, 
respectively. The section includes only spring 2019 results. The fall 2018 results are located in the Addendum. ** The 
tables provide the average reliability, which is estimated by averaging the internal consistency estimates computed 
for all the individual forms of the test and the raw score SEMs. In addition, the number of forms, the sample size of 
the minimum reliability, sample size of the maximum reliability, and the average maximum possible score for each 
set of tests are provided. Estimates were calculated only for groups of 100 or more students administered a specific 
test form; therefore, reliability estimates for ELA/L grade 11 and Integrated Mathematics | were not calculated. 


English Language Arts/Literacy 
The average reliability estimates for grades 3 through 10 ELA/L range from a low of .92 to a high of .94. The tests for 


grades 3 through 5 have fewer maximum possible points than for the grades 6 through 11 tests. The average raw 
score SEM is consistently between 5 percent and 6 percent of the maximum possible score. 


Table 13.1 Summary of ELA/L Test Reliability Estimates for Total Group 
Avg. Max. 


Grade Number of . Average Minimum Reliability Maximum Reliability 
Level Forms Possible Avg. Raw Reliability N Alpha N Alpha 
Score Score SEM 
3 5 82 4.38 0.92 1,694 0.82 34,846 0.92 
4 4 106 5.36 0.92 2,099 0.83 31,699 0.92 
5 5 106 5.40 0.92 531 0.82 30,670 0.93 
6 4 109 5.50 0.93 1,967 0.86 37,982 0.94 
7 4 109 5.92 0.93 1,572 0.84 36,260 0.94 
8 4 109 5.62 0.94 1,268 0.91 35,079 0.94 
9 3 109 5.56 0.93 109 0.72 1,860 0.94 
10 5 109 5.83 0.94 146 0.89 39,231 0.94 
Mathematics 


The average reliability estimates for the mathematics assessments range from .88 to .95. All average reliability 
estimates are at least .92 except for grade 8. The raw score SEM consistently ranges from 4 percent to 6 percent of 
the maximum score. 


16 Addendum 13 provides a summary of reliability information for the fall 2018 administration. 
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Table 13.2 Summary of Mathematics Test Reliability Estimates for Total Group 


Grade Number of ani Avg. Raw Average Minimum Reliability Maximum Reliability 
Level Forms Score SEM Reliability N Alpha N Alpha 
Score 

3 5 66 3.54 0.95 779 0.90 122 0.95 
4 5 66 3.67 0.95 799 0.90 38,354 0.95 
5 5 66 3.64 0.94 142 0.86 37,673 0.95 
6 4 66 3.41 0.95 105 0.84 37,020 0.95 
7 4 66 3.42 0.93 122 0.75 29,514 0.93 
8 4 66 2.90 0.88 191 0.67 19,437 0.89 
Al 5 81 3.48 0.93 175 0.50 34,458 0.94 
GO 2 81 3.83 0.95 5,749 0.95 6,254 0.95 
A2 2 81 3.93 0.94 2,213 0.94 2,002 0.94 
M2 1 80 3.72 0.92 146 0.92 146 0.92 


A1=Algebra |, GO=Geometry, A2=Algebra Il, M2=Integrated Mathematics II. 


13.3.2 Scale Score Reliability Results 


Tables 13.3 and 13.4 summarize scale score reliability estimates and SEMs for the total testing group for ELA/L and 
mathematics, respectively, for spring 2019. The tables provide average reliabilities by grade/course, which are 
estimated by averaging the reliability estimates computed for all forms of the test within the grade/course level. In 
addition, the number of forms and the minimum and maximum scale score reliabilities are provided. Since estimates 
of scale score reliability are sample independent, form-level results are included even for grades with low sample 
sizes; therefore, the number of forms listed in Tables 13.3 and 13.4 are larger than the number of forms listed in 
Tables 13.1 and 13.2. 


English Language Arts/Literacy 
The average scale score reliability estimates for grades 3 through 11 ELA/L range from .90 to .93. Scale score 
reliability estimates are at least .89 for all forms. The average scale score SEM ranges from 8.54 to 12.29. 


Mathematics 

The average scale score reliability estimates for the grades 3 through 8 mathematics assessments range from .90 

to .93, with the exception of grade 8 at .87. For the high school assessments, these quantities range from .85 to .90. 
For grades 3-7, the average scale score SEM ranges from 8.35 to 8.93, with grade 8 at 12.52. For high school tests, 
the average scale score SEM ranges from 9.13 to 12.60. 
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Table 13.3 Summary of ELA/L Test Scale Score Reliability Estimates for Total Group 


Avg. Scale Score Avg.ScaleScore Min.ScaleScore Max. Scale Score 


Bregerevel  Nimever Porne SEM Reliability Reliability Reliability 
3 6 12.29 0.90 0.89 0.91 
4 6 10.36 0.90 0.90 0.91 
5 5 9.74 0.90 0.89 0.92 
6 5 8.54 0.92 0.91 0.92 
7 5 9.87 0.92 0.92 0.93 
8 5 9.75 0.93 0.92 0.93 
9 6 10.18 0.92 0.91 0.93 
10 6 12.16 0.92 0.91 0.93 
11 6 11.94 0.90 0.89 0.91 


Table 13.4 Summary of Mathematics Test Scale Score Reliability Estimates for Total Group 


Grade/Course Ainnsberatarrad Avg. Scale Score Avg.ScaleScore Min. ScaleScore Max. Scale Score 
Level SEM Reliability Reliability Reliability 
3 5 8.93 0.93 0.93 0.93 
4 5 8.35 0.93 0.93 0.93 
5 5 8.63 0.92 0.92 0.93 
6 5 8.65 0.92 0.92 0.92 
7 6 8.84 0.90 0.88 0.90 
8 6 12.52 0.87 0.85 0.89 
Al 6 11.81 0.88 0.87 0.88 
GO 6 9.13 0.90 0.88 0.91 
A2 6 12.60 0.89 0.89 0.90 
M1 5 11.82 0.88 0.86 0.90 
M2 4 12.58 0.85 0.83 0.86 


A1=Algebra |, GO=Geometry, A2=Algebra II, M1=Integrated Mathematics |, M2=Integrated Mathematics II. 


13.4 Reliability Results for Subgroups of Interest 


When the sample size was sufficiently large, raw score reliability and SEM were estimated for the groups identified 
for DIF analysis. Estimates were calculated only for groups of 100 or more students administered a specific test form. 
Due to low sample size, estimates for ELA/L grade 11 and Integrated Mathematics | and II were excluded. 


Tables 13.5 and 13.6 summarize test reliability for groups of interest for ELA/L grade 3 and mathematics grade 3, 
respectively. Corresponding information is provided in Appendix 13.1 for all ELA/L and mathematics grades. For each 
group, the average, minimum, and maximum reliability estimates are listed, as well as the sample sizes of the 
reported minimum and maximum reliabilities. Note that reliability estimates are dependent on score variance, and 
subgroups with smaller variance are likely to have lower reliability estimates than the total group. 
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13.4.1 Reliability Results for Gender 


English Language Arts/Literacy 
The average reliability estimates and the average SEMs for males and females are similar to the corresponding 


values for the total group. For most tests, the average reliabilities between males and females are equal or 
within .01. The SEMs for females are slightly higher than for males for the majority of tests. 


Mathematics 
As with the ELA/L test components, the average reliability estimates and SEMs for males and females reflect the 


corresponding reliabilities for the total group. For most tests, the average reliabilities between males and females 
are equal or within .01. The SEMs for females are slightly higher than for males for the majority of tests. 


13.4.2 Reliability Results for Ethnicity 


English Language Arts/Literacy 
The majority of the average reliabilities for the ethnicity groups are equal to or .01 to .03 lower than for the total 


group. There is not a consistent difference among the average reliabilities for white, black/African American, 
Asian/Pacific Islander, Hispanic/Latino, American Indian/Alaska native, and multiple-ethnicity students, with the 
majority of the reliabilities between .90 and .94. Average SEMs were generally slightly higher for white, Asian/Pacific 
Islander, and multiple-ethnicity students than for black/African American, Hispanic/Latino, and American 
Indian/Alaska Native students. 


Mathematics 
As with the ELA/L reliabilities, the average reliabilities for ethnicity groups are marginally lower than for the total 


group of students, but similar across ethnic groups. Average SEMs were generally slightly higher for white, 
Asian/Pacific Islander, and multiple-ethnicity students, and lower for black/African American and Hispanic/Latino 
students. 


13.4.3 Reliability Results for Special Education Needs 


English Language Arts/Literacy 
The average reliabilities for five groups of students (economically disadvantaged, not economically disadvantaged, 


non-English learner, students with disabilities, and students without disabilities) are generally .01 to .02 less than the 
average reliability for the total group of students. The majority of the average reliabilities range from .90 to .93. The 
average reliabilities for English learner students are lower, generally ranging from .85 to .93. The SEMs are generally 
higher for the larger student groups (not economically disadvantaged students, non-English learner students, and 
students without disabilities). 


Mathematics 
The average reliabilities for not economically disadvantaged students, non-English learner students, and students 


without disabilities are generally equal to or .01 less than the average reliability for the total group of students. For 
economically disadvantaged and students with disabilities, the average reliabilities are generally .01 to .03 lower 
than those for the total group, while the average reliabilities for English Learners ranged from .02 lower to .10 lower. 
In general, the SEMs are higher for the larger student groups (not economically disadvantaged students, non-English 
learner students, and students without disabilities). 
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13.4.4 Reliability Results for Students Taking Accommodated Forms 


English Language Arts/Literacy 
One of the four accommodation form types (text-to-speech) had sufficient sample sizes to allow for estimation of 


reliability and SEM for grades 3 through 10. These reliability and SEM estimates were somewhat lower than for the 
total group. 


Mathematics 
The text-to-speech forms had sufficient sample sizes for reliability and SEM estimation across grades/subjects, 


except for the Algebra II courses where the sample was not sufficient. For almost all tests in grades 3 through 8, text- 
to-speech average reliabilities were .01 to .05 lower than the total group reliabilities. For high school tests, text-to- 
speech average reliabilities were .09 to .10 lower than the total group reliabilities. SEMs were somewhat lower than 
the total group SEMs across all grades. 


13.4.5 Reliability Results for Students Taking Translated Forms 


Mathematics 
With the exception of Geometry and Algebra II, there were sufficient numbers of students taking the Spanish- 


language form for reliability and SEM estimation. The average reliability ranged from .91 to .95 for grades 3 through 
5, and .66 to .84 for grades 6 through 8 and Algebra |. The SEMs are generally lower for the students administered 
the Spanish-language forms. 
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Max. Raw 
Score 
Total Group 82 
Gender 
Male 82 
Female 82 
Ethnicity 
White 82 
Black or African American 82 
Asian/Pacific Islander 82 
American Indian/ Alaska 
i n/r 
Native 
Hispanic/Latino 82 
Multiple 82 
Special Instruction Needs 
Economically Disadvantaged 82 
Not Economically 32 
Disadvantaged 
English Learner 82 
Non-English Learner 82 
Students with Disabilities 82 
Students without Disabilities 82 
Students Taking 
Accommodated Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 82 


n/r = not reported due to n<100. 


New Meridian 


Avg. SEM 


4.38 


4.28 


4.49 


4.52 


4.20 
4.53 
4.10 
4.44 
3.83 


4.47 


Minimum Reliability 
Alpha 


Average 
Reliability N 

0.92 1,694 
0.92 1,101 
0.92 593 
0.91 388 
0.90 831 
0.90 2,097 

n/r n/r 
0.91 140 
0.92 1,584 
0.90 1,237 
0.91 453 
0.88 349 
0.92 1,099 
0.90 1,694 
0.92 30,293 

n/r n/r 

n/r n/r 

n/r n/r 
0.80 1,639 

February 28, 2020 


0.82 


0.81 


0.83 


0.87 
0.72 
0.90 

n/r 
0.80 


0.91 


0.75 
0.88 
0.74 
0.83 
0.82 


0.92 


n/r 
n/r 
n/r 


0.80 


Maximum Reliability 


N Alpha 
34,846 0.92 
17,612 0.92 
17,234 0.92 
12,072 0.91 
12,468 0.91 

2,184 0.91 
n/r n/r 
6,353 0.91 
1,713 0.92 
16,535 0.91 
18,224 0.91 
4,768 0.89 
27,561 0.92 
4,175 0.92 
30,671 0.92 
n/r n/r 
n/r n/r 
n/r n/r 
1,639 0.80 
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Total Group 

Gender 

Male 

Female 

Ethnicity 

White 

Black/African American 
Asian/Pacific Islander 
American Indian/Alaska Native 
Hispanic/Latino 

Multiple 

Special Instruction Needs 
Economically Disadvantaged 
Not Economically Disadvantaged 
English Learner 

Non-English Learner 

Students with Disabilities 


Students without Disabilities 


Students Taking Accommodated Forms 


ASL 

Closed-Caption 

Screen Reader 

Text-to-Speech 

Students Taking Translated Forms 


Spanish Language Form 


n/r = not reported due to n<100. 


New Meridian 


Max. Raw 
Score 


66 


66 


66 


66 
66 
66 
n/r 
66 


66 


66 
66 
66 
66 
66 


66 


n/r 
n/r 
n/r 


66 


66 


Avg. SEM 


3.54 


3.53 


3.55 


3.62 
3.41 
3.53 

n/r 
3.47 


3.59 


3.41 
3.60 
3.39 
3.56 
3.33 


3.57 


February 28, 2020 


Average 
Reliability 


0.95 


0.95 


0.94 


0.93 
0.94 
0.93 

n/r 
0.94 


0.94 


0.94 
0.94 
0.93 
0.94 
0.94 


0.94 


n/r 
n/r 
n/r 


0.93 


0.95 


Minimum Reliability 


N 


779 


513 


266 


199 
275 
2,217 
n/r 
246 


1,799 


463 
311 
260 
451 
687 


30,006 


n/r 
n/r 
n/r 


7,150 


122 


Maximum Reliability 


Alpha N Alpha 
0.90 122 0.95 
0.91 18,111 0.95 
0.88 16,881 0.94 
0.91 12,396 0.93 
0.89 11,488 0.94 
0.93 2,330 0.93 

n/r n/r n/r 
0.88 121 0.95 
0.94 1,879 0.94 
0.89 15,281 0.94 
0.91 19,841 0.94 
0.89 122 0.95 
0.90 27,605 0.95 
0.89 5,058 0.94 
0.94 106 0.96 

n/r n/r n/r 

n/r n/r n/r 

n/r n/r n/r 
0.93 7,192 0.93 
0.95 122 0.95 
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13.5 Reliability Results for English Language Arts/Literacy Claims and 
Subclaims 


Participating states and agencies developed subclaims in addition to major claims based on the Common Core 
State Standards. English language arts/literacy (ELA/L) has two major claims relating to Reading and Writing. The 
major claim for Reading is that students read and comprehend a range of sufficiently complex texts independently. 
The major claim for Writing is that students write effectively when using and/or analyzing sources. Refer to Table 
13.7 for a summary of the ELA/L claims and subclaims. 


Table 13.7 Descriptions of ELA/L Claims and Subclaims 
English Language Arts/Literacy 


Major Claim Subclaim Description 
Reading Reading Literature Students demonstrate comprehension and draw evidence from readings 
of grade-level, complex literary text. 
Reading Reading Information Students demonstrate comprehension and draw evidence from readings 
of grade-level, complex informational text. 
Reading Reading Vocabulary Students use context to determine the meaning of words and phrases. 
Writing Writing Written Students produce clear and coherent writing in which the development, 
Expression organization, and style are appropriate to the task, purpose, and audience. 
Writing Writing Knowledge Students demonstrate knowledge of conventions and other important 
Language and elements of language. 


Conventions 


Reliability indices were calculated for each major claim and subclaim. Table 13.8 presents the average reliability 
estimates for all forms of the test at the specified grade and testing mode for the ELA/L tests. The sample size for 
grade 11 was not sufficient for reliability analyses. In order to assist in understanding the reliability estimates, the 
range of maximum number of points for each major claim and subclaim is also provided. 


The average reliabilities for the Reading claim for grades 3 through 10 range from .88 to .91 with a median of .90. 
They are based on a maximum score of 64 points, except for grade 3 (44-46 points). The Writing claim average 
reliabilities are based on a lower number of points than those for the Reading claim, and are slightly lower, ranging 
from .84 to .89 with a median between .87 and .88. The reliabilities for the Writing claim for grades 3, 4, and 5 are 
based on maximum raw scores of 36 points, 42 points, and 42 points, respectively, while the average reliabilities 
for the grades 6 through 11 Writing claims are based on a maximum score of 45 points. 


The average reliabilities of the Reading Literature subclaim scores vary from .73 to .84, with a median between .78 
and .79. The maximum number of points per form ranges from 18 to 34. The average reliabilities of the Reading 
Information subclaim scores vary from .73 to .82, with a median of .76, and the maximum number of points per 
form ranges from 13 to 32. The average reliabilities of the Reading Vocabulary subclaim scores have a median of 
.67, and the reliabilities vary from .61 to .74. The maximum number of points per form for this subclaim ranges 
from 10 to 18. 


The Writing Written Expression subclaim is based on 27 points for grade 3 and 33 points for grades 4 and 5. Grades 
6 through 11 are based on 36 points. The median of the average reliabilities for the tests is .86 and the average 
reliabilities range from .80 to .90. The Writing Knowledge of Language and Conventions subclaims are all based on 
nine points. The median average reliability is .88 and reliabilities range from .87 to .90. 
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Writing: Knowledge 
Reading: Total Reading: Literature Reading: Information Reading: Vocabulary Writing: Total Writing Expression Language and 
Conventions 
Range of Range of Range of Range of Range of Range of 
Grade Max Average Max Average Max Average Range of Max Average Max Average Max Average Max Average 
Level Raw Reliability Raw Reliability Raw Reliability Raw Score Reliability Raw Reliability Raw Reliability Raw Reliability 
Score Score Score Score Score Score 
3 44-46 0.89 19-21 0.79 13-17 0.73 10-12 0.67 36-36 0.84 27-27 0.81 9-9 0.87 
4 64-64 0.88 24-26 0.73 22-26 0.75 12-16 0.61 42-42 0.85 33-33 0.82 9-9 0.88 
5 64-64 0.90 24-26 0.78 22-24 0.74 14-16 0.74 42-42 0.85 33-33 0.80 9-9 0.88 
6 64-64 0.91 24-34 0.84 14-24 0.74 12-16 0.71 45-45 0.87 36-36 0.85 9-9 0.88 
7 64-64 0.90 24-26 0.79 22-24 0.80 14-16 0.66 45-45 0.88 36-36 0.87 9-9 0.89 
8 64-64 0.90 24-32 0.79 14-26 0.77 14-18 0.68 45-45 0.89 36-36 0.90 9-9 0.90 
9 64-64 0.90 18-26 0.76 24-32 0.82 12-14 0.67 45-45 0.88 36-36 0.87 9-9 0.88 
10 64-64 0.90 20-26 0.74 24-32 0.82 12-14 0.65 45-45 0.89 36-36 0.89 9-9 0.90 
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13.6 Reliability Results for Mathematics Subclaims 


For mathematics, there are four subclaims related to whether students are on track or ready for college and 
careers: 


e Subclaim A: Students solve problems involving the major content for their grade/course level with 
connections to the Standards for Mathematical Practice. 


e  Subclaim B: Students solve problems involving the additional and supporting content for their 
grade/course level with connections to the Standards for Mathematical Practice. 


e Subclaim C: Students express grade/course-level appropriate mathematical reasoning by constructing 
viable mathematical arguments and critiquing the reasoning of others, and/or attending to precision 
when making mathematical statements. 


e Subclaim D: Students solve real-world problems with a degree of difficulty appropriate to the 
grade/course by applying knowledge and skills articulated in the standards and by engaging particularly in 
the modeling practice. 


Reliability estimates were calculated for each subclaim for mathematics. Table 13.9 presents the average reliability 
estimates for mathematics subclaims. The sample size for Integrated Mathematics | was not sufficient for reliability 
analyses. 


Subclaims with greater numbers of points tend to have greater reliability estimates. The Major Content subclaim 
has the largest number of points for each assessment and, accordingly, has higher average reliabilities than the 
other three subclaims. The average reliability for the Major Content subclaim for grade 8 is .78, with the other 
grades ranging from .81 to .91 with a median of .89. The maximum number of points per form range from 23 to 32. 


The average reliability for the Additional and Supporting Content subclaim is .54 for grade 8. Average reliabilities 
for the other grades range from .67 to .82, with a median of .74. The maximum number of points per form for this 
subclaim ranges from 9 to 25. 


The average reliabilities for Mathematics Reasoning range from .62 to .80, with a median between .71 and .72. The 
maximum number of points for this subclaim is 14 for all grades and forms. 


For the Modeling Practice subclaim, the average reliability is .53 for grade 8 and .67-.78, with a median of .74, for 
the other grades. The maximum number of points is 12 for grades 3 through 8, 15-18 for Integrated Mathematics 
Il, and 18 for all other high school courses. 
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Major Content 


Additional & Supporting Content 


Mathematics Reasoning 


Modeling Practice 


Grade Level 


Range of Max 
Raw Score 


Average 
Reliability 


Raw Score 


M2 


Range of Max Average 
Raw Score Reliability 
28-28 0.91 
31-31 0.90 
30-30 0.90 
26-26 0.88 
29-29 0.87 
27-27 0.78 
25-26 0.81 
30-30 0.89 
23-27 0.84 
23-32 0.82 


Range of Max Average 
Raw Score Reliability 

12-12 0.77 

9-9 0.72 
10-10 0.70 
14-14 0.76 
11-11 0.67 
13-13 0.54 
15-17 0.74 
19-19 0.82 
20-20 0.80 
17-25 0.73 


Range of Max Average 
Reliability 
14-14 0.63 
14-14 0.80 
14-14 0.72 
14-14 0.78 
14-14 0.65 
14-14 0.62 
14-14 0.78 
14-14 0.74 
14-14 0.71 
14-14 0.66 


12-12 
12-12 
12-12 
12-12 
12-12 
12-12 
18-18 
18-18 
18-18 
15-18 


0.77 
0.67 
0.75 
0.71 
0.74 
0.53 
0.77 
0.78 
0.69 
0.69 


Note: Al = Algebra |, GO = Geometry, A2 = Algebra II, M2 = Integrated Mathematics II. Integrated Mathematics | had insufficient sample sizes. 


New Meridian 


February 28, 2020 


Page 131 


2019 Technical Report 


13.7 Reliability of Classification 


The reliability of the classifications for the students was calculated using the computer program BB-CLASS 
(Brennan, 2004), which operationalizes a statistical method developed by Livingston and Lewis (1993, 1995). As 
Livingston and Lewis (1993, 1995) explain, this method uses information from the administration of one test form 
(i.e., distribution of scores, the minimum and maximum possible scores, the cut points used for classification, and 
the reliability coefficient) to estimate two kinds of statistics, decision accuracy and decision consistency. Decision 
accuracy refers to the extent to which the classifications of students based on their scores on the test form agree 
with the classifications made on the basis of the classifications that would be made if the test scores were perfectly 
reliable. Decision consistency refers to the agreement between these classifications based on two non- 
overlapping, equally difficult forms of the test. 


Decision consistency values are always lower than the corresponding decision accuracy values, because in decision 
consistency, both of the classifications are subject to measurement error. In decision accuracy, only one of the 
classifications is based on a score that contains error. It is not possible to know which students were accurately 
classified, but it is possible to estimate the proportion of the students who were accurately classified. Similarly, it is 
not possible to know which students would be consistently classified if they were retested with another form, but 
it is possible to estimate the proportion of the students who would be consistently classified. 


13.7.1 English Language Arts/Literacy 


Table 13.10 provides information about the accuracy and the consistency of two types of classifications made on 
the basis of the summative scale scores on the grades 3 through 10 ELA/L assessments. Grade 11 ELA/L was not 
reported due to low sample size. The columns labeled “Exact level” provide the estimates of the indices based on 
classifications of students into one of five performance levels. The columns labeled “Level 4 or higher vs. 3 or 
lower” provide the estimates of the indices based on classifications of students as being either in one of the upper 
two levels (Levels 4 and 5) or in one of the lower three levels (Levels 1, 2, and 3). Performance Level 4 is considered 
the College and Career Readiness standard on the summative assessments. 


The table shows that for classifying each student into one of the five performance levels, the proportion accurately 
classified ranges from .74 to .78 with a median of .75; the proportion who would be consistently classified on two 
different test forms ranges from .64 to .69 with a median of .66. For classifying each student as being at Level 4 or 
higher vs. being at Level 3 or lower, the proportion accurately classified ranges from .91 to .93 with a median of 
.92; the proportion who would be consistently classified this way on two different test forms ranges from .88 to 
.90 with a median of .89. 
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Decision Accuracy: Proportion Accurately 


Decision Consistency: Proportion 
Consistently Classified 


Classified 

fase aerial Level 4 or higher vs. 
3 or lower 

3 0.75 0.92 
4 0.74 0.91 
5 0.76 0.91 
6 0.78 0.92 
7 0.74 0.92 
8 0.77 0.93 
9 0.74 0.93 
10 0.75 0.93 


Exact Level 


0.66 
0.64 
0.67 
0.69 
0.65 
0.68 
0.65 
0.66 


Level 4 or higher vs. 


3 or lower 
0.88 
0.88 
0.88 
0.89 
0.89 
0.90 
0.90 
0.90 


Table 13.11 provides more detailed information about the accuracy and the consistency of the classification of 


students into performance levels for ELA/L grade 3. Each cell in the 5-by-5 table shows the estimated proportion of 


students who would be classified into a particular combination of performance levels. The sum of the five bold 


values on the diagonal is approximately equal to the level of decision accuracy or consistency presented in Table 
13.10. For “Level 4 and higher vs. 3 and lower” found in Table 13.10, the sum of the shaded values in Table 13.11 is 
approximately equal to the level of decision accuracy or consistency presented in Table 13.10. Note that the sums 


based on values in Table 13.11 may not match exactly to the values in Table 13.10 due to truncation and rounding. 


Detailed information for ELA/L spring results are provided in Appendix 13.2 Tables A.13.18 through A.13.25. Fall 
block results for ELA/L grades 9 through 11 are provided in the Addendum. The structure of these tables is the 


same as that of Table 13.11 and the values in the tables should be interpreted in the same manner. 


Table 13.11 Reliability of Classification: Grade 3 ELA/L 


Full 

Summative 

Scale Score 
650-699 
700-724 
Decision Accuracy 725-749 
750-809 
810-850 
650-699 
700-724 
Decision Consistency 725-749 
750-809 
810-850 


Level 1 


0.18 
0.03 
0.00 
0.00 
0.00 
0.17 
0.04 
0.00 
0.00 
0.00 


Level2 Level3 Level4 Level5 


0.03 
0.11 
0.04 
0.00 
0.00 
0.04 
0.09 
0.04 
0.01 
0.00 


0.00 
0.04 
0.12 
0.04 
0.00 
0.00 
0.05 
0.10 
0.05 
0.00 


0.00 
0.00 
0.04 
0.31 
0.01 
0.00 
0.01 
0.05 
0.28 
0.02 


0.00 
0.00 
0.00 
0.02 
0.02 
0.00 
0.00 
0.00 
0.02 
0.02 


Category 
Total 


0.20 
0.19 
0.21 
0.37 
0.03 
0.21 
0.18 
0.20 
0.36 
0.04 


13.7.2 Mathematics 


Table 13.12 provides information about the accuracy and the consistency of two types of classifications made on 


the basis of the summative scale scores on the mathematics assessments. Integrated Mathematics | was omitted 
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due to low sample size. For the grades 3 through 8 mathematics tests, the table shows that for classifying each 
student into one of the five performance levels, the proportion accurately classified ranges from .74 to .81 witha 
median of .79; the proportion who would be consistently classified on two different test forms ranges from .66 to 
.73 with a median of .70. For the four high school mathematics courses, the table shows that for classifying each 
student into one of the five performance levels, the proportion accurately classified ranges from .75 to .81 witha 
median of .78; the proportion who would be consistently classified on two different test forms ranges from .65 to 
.73 with a median of .69. 


For classifying each student as being at Level 4 or higher vs. being at Level 3 or lower, for the grades 3 through 8 
mathematics tests, the proportion accurately classified ranges from .93 to .94; the proportion who would be 
consistently classified on two different test forms is .90 for grade 5 and .91 for other grades. For the four high 
school mathematics courses, the proportion accurately classified as being at Level 4 or higher vs. being at Level 3 
or lower is .92 for Integrated Mathematics Il and .93 for other grades; the proportion who would be consistently 
classified on two different test forms ranges from .89 to .90. 


Appendix 13.2 tables A.13.26 through A.13.35 provide more detailed information about the accuracy and the 
consistency of the classification of students into performance levels for mathematics. Each cell in the 5-by-5 table 
shows the estimated proportion of students who would be classified into a particular combination of performance 
levels. Fall block results for Algebra |, Geometry, and Algebra II are provided in the Addendum. 


Table 13.12 Reliability of Classification: Summary for Mathematics 


Decision Accuracy: Proportion Decision Consistency: Proportion 

Accurately Classified Consistently Classified 

ieee eyaek Level Level 4 or higher Byaettevel Level 4 or higher 
vs. 3 or lower vs. 3 or lower 

3 0.79 0.93 0.71 0.91 
4 0.80 0.93 0.73 0.91 
5 0.78 0.93 0.69 0.90 
6 0.81 0.94 0.73 0.91 
7 0.78 0.94 0.70 0.91 
8 0.74 0.94 0.66 0.91 
Al 0.77 0.93 0.68 0.90 
GO 0.81 0.93 0.73 0.90 
A2 0.79 0.93 0.71 0.89 
M2 0.75 0.92 0.65 0.89 


Note: A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = Integrated 
Mathematics II, M3 = Integrated Mathematics III. 


13.8 Inter-rater Agreement 


Inter-rater agreement is the agreement between the first and second scores assigned to student responses. Inter- 
rater agreement measurements include exact, adjacent, and nonadjacent agreement. Pearson scoring staff used 
these statistics as one factor in determining the needs for continuing training and intervention on both individual 
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and group levels. Table 13.13 displays both the expectations and the actual agreement percentages for perfect 


agreement and perfect plus adjacent agreement. 


Table 13.13 Inter-rater Agreement Expectations and Results!” 


: Score Point ee Gedboce Within One Point Within One Point 

Subject Agreement Agreement : 
Range : Expectation Result 

Expectation Result 

Mathematics 0-1 90% 98% 100% 100% 
Mathematics 0-2 80% 97% 100% 100% 
Mathematics 0-3 70% 95% 100% 99% 
Mathematics 0-4 65% 94% 99% 99% 
Mathematics 0-5 65% 93% 99% 98% 
Mathematics 0-6 65% 95% 99% 98% 
ELA/L Multi-trait 65% 80% 100% 99% 


Note: AO or 1 score compared to a blank score will have a disagreement greater than 1 point. 


Pearson’s ePEN2 scoring system included comprehensive inter-rater agreement reports that allowed supervisory 
personnel to monitor both individual and group performance. Based on reviews of these reports, scoring experts 
targeted individuals for increased backreading and feedback and, if necessary, retraining. Table 13.13 shows that 
the actual percentages for perfect reader agreement were higher than the inter-rater agreement expectations, 
and the percentages for within one point were very close. Refer to Section 4 for more information on handscoring. 


1’ This table is identical to Table 4.5 in Section 4. 
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Section 14: Validity 


14.1 Overview 


The Standards for Educational and Psychological Testing, issued jointly by the American Educational Research 
Association [AERA], American Psychological Association [APA], and National Council on Measurement in Education 
[NCME] (2014), reports: 


Validity refers to the degree to which evidence and theory support the interpretations of test 
scores for proposed uses of tests. Validity is, therefore, the most fundamental consideration in 
developing tests and evaluating tests. The process of validation involves accumulating relevant 
evidence to provide a sound scientific basis for the proposed score interpretations (p. 11). 


The purpose of test validation is not to validate the test itself but to validate interpretations of the test scores for 
particular uses. Test validation is not a quantifiable property but an ongoing process, beginning at initial 
conceptualization and continuing throughout the lifetime of an assessment. Every aspect of an assessment 
provides evidence in support of its validity (or evidence of lack of validity), including design, content specifications, 
item development, and psychometric characteristics. The 2018-2019 operational assessment provided an 
opportunity to gather evidence of validity based on both test content and on the internal structure of the tests. 


Pearson applies the principles of universal design, as articulated in materials developed by the National Center for 
Educational Outcomes (NCEO) at the University of Minnesota (Thompson et al., 2002). 


14.2 Evidence Based on Test Content 


Evidence based on content of achievement tests is supported by the degree of correspondence between test items 
and content standards. The degree to which the test measures what it claims to measure is known as construct 
validity. The summative assessments adhere to the principles of evidence-centered design, in which the standards 
to be measured (the Common Core State Standards) are identified, and the performance a student needs to 
achieve to meet those standards is delineated in the evidence statements. Test items are reviewed for adherence 
to universal design principles, which maximize the participation of the widest possible range of students. 


Pearson and New Meridian built spreadsheets at the evidence statement level that incorporate the probability 
statements from the test blueprints and attrition rates at committee review and data review. The basis of our 
entire item development is driven by the use of these item development target spreadsheets. Before beginning 
item development, Pearson uses these target spreadsheets to develop an internal item development plan to 
correlate with the expectations of the test design. These are reviewed and approved by state or agency leads and 
New Meridian. All parties acknowledge that each assessment has multiple parts and each part specifies the types 
of tasks and standards eligible for assessment. 


In addition to the evidence statements, content is aligned through the articulation of performance in the 
performance level descriptors. At the policy level, the performance level descriptors include policy claims about 
the educational achievement of students who attain a particular performance level, and a broad description of the 
grade-level knowledge, skills, and practices students performing at a particular achievement level are able to 
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demonstrate. Those policy-level descriptors are the foundation for the subject- and grade-specific performance 
level descriptors, which, along with the evidence frameworks, guide the development of the items and tasks. 


The college- and career-ready determinations (CCRD) in English language arts/literacy (ELA/L) and mathematics 
describe the academic knowledge, skills, and practices students must demonstrate to show readiness for success 
in entry-level, credit-bearing college courses and relevant technical courses. The states and agencies determined 
that this level means graduating from high school and having at least a 75 percent likelihood of earning a grade of 
“C” or better in credit-bearing courses without the need for remedial coursework. After reviewing the standards 
and assessment design, the Governing Board (made up of the K-12 education chiefs in participating states or 
agencies) in conjunction with the Advisory Committee on College Readiness (composed of higher education chiefs 
in the participating states or agencies), determined that students who achieve at Levels 4 and 5 on the final high 
school assessments are likely to have acquired the skills and knowledge to meet the definition of college- and 
career-readiness. To validate the determinations, a postsecondary educator judgment study and a benchmark 
study of the SAT, ACT, National Assessment of Educational Progress (NAEP), Trends in International Mathematics 
and Science Study (TIMSS), Programme of International Student Assessment (PISA), and Progress in International 
Reading Literacy Study (PIRLS) tests were conducted (McClarty et al., 2015). 


Gathering construct validity evidence for the assessments is embedded in the process by which the assessment 
content is developed and validated. At each step in the assessment development process, participating states or 
agencies involved hundreds of educators, assessment experts, and bias and sensitivity experts in review of text, 
items, and tasks for accuracy, appropriateness, and freedom from bias. See Section 2 for an overview of the 
content development process. In the early stages of development, Pearson conducted research studies to validate 
the item and task development approach. One such study was a student task interaction study designed to collect 
data on the student’s experience with the assessment tasks and technological functionalities, as well as the 
amount of time needed for answering each task. Pearson also conducted a rubric choice study that compared the 
functioning of two rubrics developed to score the prose constructed-response (PCR) tasks in ELA/L. Quantitative 
and qualitative evidence was collected to support the use of a condensed or expanded trait scoring rubric in 
scoring student responses. 


The items and tasks were field tested prior to their use on an assessment. During the initial field test 
administration in 2014, participating states and agencies collected feedback from students, test administrators, 
test coordinators, and classroom teachers on their experience with the assessments, including the quality of test 
items and student experience. Information pertaining to this process can be found at 
https://resources.newmeridiancorp.org/research/. The feedback from that survey was used to inform test 
directions, test timing, and the function of online task interactions. Performance data from the field test also 
informed the future development of additional items and tasks. 


All item developers and item writers are provided an electronic version of the accessibility guidelines and the 
linguistic complexity rubric. Items and passages are reviewed internally by accessibility and fairness experts trained 
in the principles of universal design and who become well versed in the accessibility guidelines. Items received 
internal review for alignment to evidence tables, task generation model, item selection guidelines, and accessibility 
and fairness reviews. 


An important consideration when constructing test forms is recognition of items that may introduce construct- 
irrelevant variance. Such items should not be included on test forms to help ensure fairness to all subgroups of 
students. New Meridian convened bias and sensitivity committees to review all items. Additionally, content 
experts facilitated reviews of all items. All reviewers were trained using the bias and sensitivity guidelines, and the 
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guidelines were used to review items and ELA/L passages. Accommodations were made available based on 
individual need documented in the student’s approved IEP, 504 Plan, or if required by the participating state or 
agency, an English Learner (EL) Plan. An accessibility specialist worked in consultation with the accessibility 
specialist to review forms and determine which forms should be used for students with accommodations. 


The ELA/L and mathematics operational test forms, as described in Section 2, were carefully constructed to align 
with the test blueprints and specifications that are based on the Common Core State Standards (CCSS). During the 
fall of 2016, content experts representing various participating states and agencies, along with other content 
experts, held a series of meetings to review the operational forms for ELA/L and mathematics. These meetings 
provided opportunity to evaluate test forms in their entirety and recommend changes. Requested item 
replacements were accommodated to the extent possible while striving to maintain the integrity of the various 
linking designs required for the operational test analyses. Psychometricians were available throughout this process 
to provide guidance with regard to implications of item replacements for the linking and statistical requirements. 


Further information regarding the college- and career-ready content standards, performance level descriptors, and 
accessibility features and accommodations is provided at http://resources.newmeridiancorp.org/. 


14.3 Evidence Based on Internal Structure 


Analyses of the internal structure of a test typically involve studies of the relationships among test items and/or 
test components (i.e., subclaims) in the interest of establishing the degree to which the items or components 
appear to reflect the construct on which a test score interpretation is based (AERA, APA, & NCME, 2014, p. 16). The 
term construct is used here to refer to the characteristics that a test is intended to measure; in the case of the 
operational tests, the characteristics of interest are the knowledge and skills defined by the test blueprint for ELA/L 
and for mathematics. 


The summative assessments provide a full summative test score, Reading claim score, and Writing claim score as 
well as ELA/L subclaim and mathematics subclaim scores. The goal of reporting at this level is to provide criterion- 
referenced data to assess the strengths and weaknesses of a student’s achievement in specific components of 
each content area. This information can then be used by teachers to plan for further instruction, to plan for 
curriculum development, and to report progress to parents. The results can also be used as one factor in making 
administrative decisions about program effectiveness, teacher effectiveness, class grouping, and needs 
assessment. 


14.3.1 Intercorrelations 


The ELA/L full summative tests comprise two claim scores, Reading (RD) and Writing (WR), and five subclaim 
scores—Reading Literature (RL), Reading Information (RI), Reading Vocabulary (RV), Writing Written Expression 
(WE), and Writing Knowledge Language and Conventions (WKL). The RD claim score is a composite of RL, RI, and 
RV. The writing claim score, a composite of WE and WKL, comprises only PCR items, and the same PCR items are in 
each subclaim. The ELA/L operational test analyses were performed by evaluating the separate trait scores of WE 
and WKL, and for some PCR items also RL or RI; therefore, the trait scores were used for the intercorrelations. 


The mathematics full summative tests have four subclaim scores— Major Content (MC), Mathematical Reasoning 
(MR), Modeling Practice (MP), and Additional and Supporting Content (ASC). 


New Meridian February 28, 2020 Page 138 


2019 Technical Report 


High total group internal consistencies as well as similar reliabilities across subgroups provide additional evidence 
of validity. High reliability of test scores implies that the test items within a domain are measuring a single 
construct, which is a necessary condition for validity when the intention is to measure a single construct. Refer to 
Section 13 for reliability estimates for the overall population, subgroups of interest, as well as for claims and 
subclaims for ELA/L and subclaims for mathematics. 


Another way to assess the internal structure of a test is through the evaluation of correlations among scores. 
These analyses were conducted between the ELA/L Reading and Writing claim scores and the ELA/L subclaims (RL, 
RI, RV, WE, and WKL) and between the mathematics subclaims. If these components within a content area are 
strongly related to each other, this is evidence of unidimensionality. 


A series of tables are provided to summarize the results for the spring 2019 administration. 1° Tables 14.1 through 
14.8 present the Pearson correlations observed between the ELA/L Reading and Writing claim scores and subclaim 
scores for each grade. The tables provide the weighted average intercorrelations by averaging the intercorrelations 
computed for all the core operational forms of the test within each grade level. The total sample size across all 
forms is provided in the upper triangle portion of the tables. The subclaim reliabilities (from Section 13) are 
reported along the diagonal. The WR, WE, and WKL scores tended to be highly correlated; this is expected given 
that these three intercorrelations are based on the trait scores from the same Writing items. RL, RI, and RV, all 
subclaims of Reading, are moderately to highly correlated. Additionally, the WR claim and the WE and WKL 
subclaims are moderately correlated with RD subclaims (of RL, RI, and RV). These moderate to high ELA/L 
intercorrelations amongst the subclaims are sufficiently high to provide evidence that the ELA/L tests are 
unidimensional. The moderate intercorrelations among the subclaims and claims suggest the claims may be 
sufficient for individual student reporting. 


The intercorrelations and reliability estimates for mathematics are provided in Tables 14.9 through 14.18. The 
shaded values along the diagonal are the reliabilities as reported in Section 13. The average intercorrelations are 
provided in the lower portion of the table and the total sample sizes are provided in the upper portion of the table. 
Please refer to Appendix 12.1 (Form Composition) for information about the number of items and number of score 
points in each claim and subclaim. 


The mathematics intercorrelations are moderate. The main observable pattern in the mathematics 
intercorrelations is that the MC subclaim generally has slightly higher correlations with the ASC, MR, and MP 
subclaims; the intercorrelations amongst the ASC, MR, and MP subclaims are usually slightly lower. The 
mathematics intercorrelations are sufficiently high to suggest that the mathematics tests are likely to be 
unidimensional with some minor secondary dimensions. The sample sizes for ELA/L Grade 11, and Integrated 
Mathematics | and III were insufficient and thus are not reported here. 


18 Addendum 14 provides a summary of results for the fall 2018 administration. 
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RD RL Rl RV WR WE WKL 
RD 0.89 72,440 72,440 72,440 72,440 72,440 72,440 
RL 0.93 0.79 72,440 72,440 72,440 72,440 72,440 
Rl 0.89 0.72 0.73 72,440 72,440 72,440 72,440 
RV 0.87 0.72 0.68 0.67 72,440 72,440 72,440 
WR 0.74 0.69 0.70 0.58 0.84 72,440 72,440 
WE 0.73 0.68 0.69 0.57 0.99 0.81 72,440 
WKL 0.67 0.62 0.64 0.53 0.92 0.85 0.87 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 


Table 14.2 Average Intercorrelations and Reliability between Grade 4 ELA/L Subclaims 


RD RL Rl RV WR WE WKL 
RD 0.88 74,164 74,164 74,164 74,164 74,164 74,164 
RL 0.91 0.73 74,164 74,164 74,164 74,164 74,164 
Rl 0.92 0.73 0.75 74,164 74,164 74,164 74,164 
RV 0.83 0.65 0.65 0.61 74,164 74,164 74,164 
WR 0.75 0.70 0.70 0.56 0.85 74,164 74,164 
WE 0.74 0.70 0.69 0.55 0.99 0.82 74,164 
WKL 0.70 0.65 0.65 0.53 0.93 0.88 0.808 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 
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RD RL Rl RV WR WE WKL 
RD 0.90 75,418 75,418 75,418 75,418 75,418 75,418 
RL 0.93 0.78 75,418 75,418 75,418 75,418 75,418 
Rl 0.89 0.74 0.74 75,418 75,418 75,418 75,418 
RV 0.88 0.74 0.69 0.74 75,418 75,418 75,418 
WR 0.75 0.73 0.68 0.60 0.85 75,418 75,418 
WE 0.74 0.72 0.67 0.60 0.99 0.80 75,418 
WKL 0.72 0.70 0.65 0.59 0.95 0.91 0.88 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 


Table 14.4 Average Intercorrelations and Reliability between Grade 6 ELA/L Subclaims 


RD RL Rl RV WR WE WKL 
RD 0.91 78,751 78,751 78,751 78,751 78,751 78,751 
RL 0.94 0.84 78,751 78,751 78,751 78,751 78,751 
Rl 0.91 0.76 0.74 78,751 78,751 78,751 78,751 
RV 0.87 0.75 0.70 0.71 78,751 78,751 78,751 
WR 0.75 0.72 0.72 0.58 0.87 78,751 78,751 
WE 0.74 0.72 0.71 0.58 1.00 0.85 78,751 
WKL 0.74 0.71 0.71 0.58 0.97 0.95 0.88 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 
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RD RL Rl RV WR WE WKL 
RD 0.90 75,078 75,078 75,078 75,078 75,078 75,078 
RL 0.93 0.79 75,078 75,078 75,078 75,078 75,078 
Rl 0.93 0.78 0.80 75,078 75,078 75,078 75,078 
RV 0.86 0.71 0.72 0.66 75,078 75,078 75,078 
WR 0.77 0.73 0.75 0.60 0.88 75,078 75,078 
WE 0.77 0.72 0.75 0.59 1.00 0.87 75,078 
WKL 0.77 0.72 0.75 0.59 0.97 0.96 0.89 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 


Table 14.6 Average Intercorrelations and Reliability between Grade 8 ELA/L Subclaims 


RD RL Rl RV WR WE WKL 
RD 0.90 72,617 72,617 72,617 72,617 72,617 72,617 
RL 0.93 0.79 72,617 72,617 72,617 72,617 72,617 
Rl 0.91 0.78 0.77 72,617 72,617 72,617 72,617 
RV 0.88 0.74 0.72 0.68 72,617 72,617 72,617 
WR 0.80 0.76 0.76 0.64 0.89 72,617 72,617 
WE 0.79 0.76 0.76 0.64 1.00 0.90 72,617 
WKL 0.79 0.75 0.76 0.64 0.98 0.97 0.90 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 
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RD RL Rl RV WR WE WKL 
RD 0.90 3,388 3,388 3,388 3,388 3,388 3,388 
RL 0.89 0.76 3,388 3,388 3,388 3,388 3,388 
Rl 0.94 0.75 0.82 3,388 3,388 3,388 3,388 
RV 0.85 0.69 0.70 0.67 3,388 3,388 3,388 
WR 0.83 0.77 0.80 0.64 0.88 3,388 3,388 
WE 0.82 0.76 0.79 0.63 1.00 0.87 3,388 
WKL 0.82 0.76 0.79 0.63 0.98 0.97 0.88 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 


Knowledge and Conventions. 


Table 14.8 Average Intercorrelations and Reliability between Grade 10 ELA/L Subclaims 


RD RL Rl RV WR WE WKL 
RD 0.90 70,833 70,833 70,833 70,833 70,833 70,833 
RL 0.90 0.74 70,833 70,833 70,833 70,833 70,833 
Rl 0.95 0.77 0.82 70,833 70,833 70,833 70,833 
RV 0.85 0.68 0.72 0.65 70,833 70,833 70,833 
WR 0.79 0.74 0.76 0.61 0.89 70,833 70,833 
WE 0.79 0.74 0.76 0.61 1.00 0.89 70,833 
WKL 0.79 0.74 0.76 0.61 0.98 0.97 0.90 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE = Written Expression, and WKL = Writing 
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Table 14.9 Average Intercorrelations and Reliability between Grade 3 Mathematics Subclaims 


MC ASC MR MP 
Mc 0.91 79,057 79,057 79,057 
ASC 0.84 0.77 79,057 79,057 
MR 0.73 0.67 0.63 79,057 
MP 0.81 0.74 0.69 0.77 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.10 Average Intercorrelations and Reliability between Grade 4 Mathematics Subclaims 


MC ASC MR MP 
Mc 0.90 80,577 80,577 80,577 
ASC 0.79 0.72 80,577 80,577 
MR 0.82 0.73 0.08 80,577 
MP 0.78 0.71 0.75 0.67 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.11 Average Intercorrelations and Reliability between Grade 5 Mathematics Subclaims 


MC ASC MR MP 
MC 0.90 81,484 81,484 81,484 
ASC 0.77 0.70 81,484 81,484 
MR 0.78 0.70 0.72 81,484 
MP 0.81 0.72 0.73 0.75 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.12 Average Intercorrelations and Reliability between Grade 6 Mathematics Subclaims 


MC ASC MR MP 
Mc 0.88 78,656 78,656 78,656 
ASC 0.80 0.76 78,656 78,656 
MR 0.82 0.76 0.78 78,656 
MP 0.77 0.72 0.77 0.71 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.13 Average Intercorrelations and Reliability between Grade 7 Mathematics Subclaims 


MC ASC MR MP 
MC 0.87 62,840 62,840 62,840 
ASC 0.76 0.67 62,840 62,840 
MR 0.75 0.66 0.65 62,840 
MP 0.77 0.71 0.70 0.74 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 
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Table 14.14 Average Intercorrelations and Reliability between Grade 8 Mathematics Subclaims 


MC ASC MR MP 
Mc 0.78 39,806 39,806 39,806 
ASC 0.66 0.54 39,806 39,806 
MR 0.71 0.60 0.62 39,806 
MP 0.59 0.53 0.59 0.53 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.15 Average Intercorrelations and Reliability between Algebra I Subclaims 


MC ASC MR MP 
Mc 0.81 79,937 79,937 79,937 
ASC 0.78 0.74 79,937 79,937 
MR 0.76 0.75 0.78 79,937 
MP 0.73 0.71 0.72 0.77 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.16 Average Intercorrelations and Reliability between Geometry Subclaims 


MC ASC MR MP 
MC 0.89 13,390 13,390 13,390 
ASC 0.85 0.82 13,390 13,390 
MR 0.80 0.75 0.74 13,390 
MP 0.83 0.78 0.78 0.78 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.17 Average Intercorrelations and Reliability between Algebra II Subclaims 


MC ASC MR MP 
MC 0.84 5,129 5,129 5,129 
ASC 0.83 0.80 5,129 5,129 
MR 0.76 0.75 0.71 5,129 
MP 0.71 0.72 0.69 0.69 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table 14.18 Average Intercorrelations and Reliability between Integrated Mathematics I] Subclaims 


MC ASC MR MP 
MC 0.82 190 190 190 
ASC 0.73 0.73 190 190 
MR 0.65 0.64 0.66 190 
MP 0.71 0.70 0.75 0.69 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice 
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14.3.2 Reliability 


Additionally, the reliability analyses presented in Section 13 of this technical report provide information about the 
internal consistency of the summative assessments. Internal consistency is typically measured via correlations 
amongst the items on an assessment and provides an indication of how much the items measure the same general 
construct. The reliability estimates, computed using coefficient alpha (Cronbach, 1951), are presented in Tables 
13.1 and 13.2 and are along the diagonals of Tables 14.1 through 14.21.19 The average reliabilities for ELA/L and 
mathematics summative assessments range from .88 up to .95. Appendix 13.1 Tables A.13.1 through A.13.8 
summarize test reliability for groups of interest for ELA/L grades 3 through 10, and Appendix 13.1 Tables A.13.9 
through A.13.17 summarize test reliability for groups of interest for mathematics grades/courses. Along with the 
subclaim intercorrelations, the reliability estimates indicate that the items within each assessment are measuring 
the same construct and provide further evidence of unidimensionality. 


14.3.3 Local Item Dependence 


In addition to the intercorrelations for ELA/L and mathematics, local item independence was evaluated. Local 
independence is one of the primary assumptions of item response theory (IRT) that states the probability of 
success on one item is not influenced by performance on other items, when controlling for ability level. This 
implies that ability or theta accounts for the associations among the observed items. Local item dependence (LID) 
when present essentially overstates the amount of information predicted by the IRT model. It can exert other 
undesirable psychometric effects and represents a threat to validity since other factors besides the construct of 
interest are present. Classical statistics are also affected when LID is present since estimates of test reliability like 
IRT information can be inflated (Zenisky et al., 2003). 


The LID issue affects the choice of item scoring in IRT calibrations. Specifically, if evidence suggests these items 
indeed have local dependence, then it might be preferable to sum the item scores into clusters or testlets as a 
method of minimizing LID. However, if these items do not appear to have strong local item dependence, then 
retaining the scores as individual item scores in an IRT calibration is preferred since more information concerning 
item properties is retained. During the initial operational administration of the summative assessments in spring 
2015, a study that included two methods of investigating the presence of LID was conducted. A description of the 
methods along with study findings are summarized below. 


First, analyses of the internal consistency in items and testlets were conducted under classical test theory (Wainer 
& Thissen, 2001) as a way to evaluate the degree of LID. Two estimates of Cronbach’s alpha (Cronbach, 1951) were 
compared based on individual items in a test and those clustered into testlets. Cronbach’s alpha is formulated as: 


a= (14-1) 


where / is the total number of items, o,,,, is the covariance of items 1 and i' (i#i'),and ox is the variance of 


total scores. To compute an alpha coefficient, sample standard deviations and variances are substituted for the 
o;,, and Oo: The alpha for the total test based on individual items is compared with those that form testlets based 


19 Section 13 provides information on the computations of the reliability estimates. 
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on larger subparts. If the item-level configuration has appreciably higher levels of internal consistency compared 
with the testlets, LID may be present. 


For IRT-based methods, local dependence can be evaluated using statistics such as Q3 (Yen, 1984). The item 
residual is the difference between observed and expected performance. The Q3 index is the correlation between 


residuals of each item pair defined as 
d, =(O-E), (14-2) 


O,=r(d,,d,) (14-3) 


where O is the observed score and E is the expected value of O undera proposed IRT model and the index is 
defined as the correlation between the two item residuals. 


LID manifests itself as a residual correlation that is nonzero and large. For 03, LID can be either positive or 
negative. Positive (negative) LID indicates that performance is higher (lower) than expectation. The residual 03 


correlation matrix can be inspected to determine if there are any blocks of locally dependent items (e.g., perhaps 
blocks of items belonging to the same reading passage). For 03, the null hypothesis is that local independence 


holds. The expected value of Q3 is -1/n-1 where n is the number of items such that the statistic shows a small 
negative bias. As a rule of thumb, item pairs with moderate levels of LID for Q3 are |.2| or greater. Significant 


levels of LID are present when the statistic is greater than |.4|. An alternative is to use the Fisher r to z 
transformation and evaluate the resulting p-values. 


For the LID comparisons, the following eight test levels administered in spring 2015 were selected: 


e  =Grade 4 for span 3-5 in ELA/L, 

e Grade 4 for span 3-5 in mathematics, 

e Grade 7 for span 6-8 in ELA/L, 

e Grade 7 for span 6-8 in mathematics, 

e Grade 10 for span 9-11 in ELA/L, 

e Integrated Mathematics II for Integrated Mathematics I-III, 

e =6 Algebra I, and 

e = Algebra Il. 
One spring 2015 CBT form for each of the eight tests was selected that was roughly at the median in terms of test 
difficulty. For ELA/L, reading items were summed according to passage assignment. For mathematics, items were 
summed according to subclaims. Cronbach’s alpha was computed for the entire forms using the two different 


approaches as described above, one involving calculations at the item level and the second utilizing scores on 
summed items (i.e., testlets). Further description of the data is given in Table 14.19. 


To cross-validate the internal consistency analysis, the 03 statistic was computed from spring CBT data based on 


grade 4 ELA/L and Integrated Mathematics II items. All items in the pool at that test level were included. The CBT 
item pool for grade 4 ELA/L contained 125 items while Integrated Mathematics II had 77 items. 
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The results for the internal consistency analysis are shown in Figure 14.1. In every instance, the item-level 
Cronbach’s alpha is higher than in the testlet configuration. The greatest difference was for Algebra II, which 
showed a difference of .07. Although this was not unexpected, the magnitude of the differences in the respective 
alpha coefficients in general do not suggest a concerning level of LID. Table 14.20 shows the summary for the 03 


values. Figures 14.2 and 14.3 show graphs of the distribution of Q3 values. Most of the Q3 values were small and 


negative, again suggesting that LID is not at a level of concern. For these two test levels, the difference in the alpha 
coefficients was .03 and was consistent with the low values of Q3. 


In summary, this investigation did not find evidence for the existence of pervasive LID. The results of both the 
internal consistency analyses and Q3 methods support a claim of minimal LID. For a multiple-choice-only test 
containing four reading passages with 5 to 12 items associated with a reading passage, Sireci et al. (1991) reported 
that testlet alpha was approximately 10 percent lower than the item-level coefficient. In comparison, the tests 
have complex test structures and exhibited smaller differences in alpha coefficients. In addition, the median Q3 


values presented in Table 14.20 centered around the expectation of -1/n-1. 
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Figure 14.1 Comparison of Internal Consistency by Item and Cluster (Testlet) 
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Grade/Co Percent No. No. Item Task 

Content urse N Valid |N Complete Incomplete Items Tasks Rel. Rel. 
ELA/L 
ELA/L 4 13,660 13,518 1.04 31 0.86 0.83 
ELA/L 7 12,757 12,685 0.56 41 0.89 0.88 
ELA/L 10 3,097 3,033 2.07 41 0.90 0.87 
Mathematics 

Math 4 10,332 10,255 0.75 53 4 0.93 0.92 
Math 7 10,295 10,188 1.04 50 6 0.92 0.87 
Math Al 5,072 4,885 3.69 52 6 0.90 0.85 
Math A2 4,982 4,769 4.28 54 6 0.92 0.85 
Math M2 2,708 2,645 2.33 51 6 0.90 0.87 


Note: A1 = Algebra |, A2 = Algebra II, M2 = Integrated Mathematics II. 


Table 14.20 Summary of Q3 Values for ELA/L Grade 4 and Integrated Mathematics II (Spring 2015) 


Min. Qi Median Mean Q3 SD 
ELA/L Grade 4 

-0.138 -0.047 -0.031 -0.031 -0.017 0.279 0.030 
Integrated Mathematics II 

-0.160 -0.038 -0.017 -0.019 0.001 0.280 0.032 


Frequency of Q3 Values 
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Figure 14.2 Distribution of Q3 Values for Grade 4 ELA/L (Spring 2015) 
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Figure 14.3 Distribution of Q3 Values for Integrated Mathematics II (Spring 2015) 


14.4 Evidence Based on Relationships to Other Variables 


Empirical results concerning the relationships between scores on a test and measures of other variables external 
to the test can also provide evidence of validity when these relationships are found to be consistent with the 
definition of the construct that the test is intended to measure. As indicated in the AERA, APA, and NCME 
standards (2014), the variables investigated can include other tests that measure the same construct and different 
constructs, criterion measures that scores on the test are expected to predict, as well as demographic 
characteristics of students that are expected to be related and unrelated to test performance. 


The relationship of the scores across the ELA/L and mathematics assessments was evaluated using correlational 
analyses. Tables 14.21 through 14.26 present the Pearson correlations observed between the ELA/L scale scores 
and the mathematics scale scores for each grade. For grades 3 through 8, students must have a valid test score for 
both ELA/L and mathematics at the same grade level to be included in the tables. These tables provide the 
correlation in the lower triangle and the sample size is provided in the upper triangle. In computing the 
correlations between a particular pair of ELA/L and mathematics tests, students must have taken both tests in 
spring 2019. ELA/L, Reading (RD), and Writing (WR) are moderately to highly correlated with mathematics; the 
correlations range from .60 up to .81 for grades 3 through 8. These correlations suggest that the ELA/L and 
mathematics tests are assessing different content. The higher intercorrelations between the ELA/L, Reading (RD), 
and Writing (WR) scores suggest stronger internal relationships when compared to the correlations with the 
mathematics content area. 


The ELA/L and mathematics correlations for the high school tests are presented in Tables 14.27 through 14.29. 
Because students in high school can take the mathematics courses in different years (e.g., one student may take 
Algebra | in grade 9 while another student may take Algebra | in grade 10), the high school mathematics scores 
were correlated with several of the ELA/L grades (e.g., Algebra | correlated with both grades 9 and 10). Only 
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correlations for pairings with total sample sizes of at least 100 are shown in the tables. In grades 8 through 11, 
ELA/L, Reading (RD), and Writing (WR) scores have correlations with high school mathematics tests that range 
from .32 to .73. Correlations between high school mathematics scores and corresponding ELA/L scores 
demonstrate low to moderate correlations. 


Table 14.21 Correlations between ELA/L and Mathematics for Grade 3 


ELA/L RD WR MA 
ELA/L 72,198 72,198 72,198 
RD 0.96 72,198 72,198 
WR 0.89 0.76 72,198 
MA 0.79 0.77 0.70 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 
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Table 14.22 Correlations between ELA/L and Mathematics for Grade 4 


ELA/L RD WR MA 
ELA/L 73,949 73,949 73,949 
RD 0.96 73,949 73,949 
WR 0.90 0.75 73,949 
MA 0.80 0.79 0.70 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 


Table 14.23 Correlations between ELA/L and Mathematics for Grade 5 


ELA/L RD WR MA 
ELA/L 75,188 75,188 75,188 
RD 0.96 75,188 75,188 
WR 0.87 0.74 75,188 
MA 0.79 0.78 0.66 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 


Table 14.24 Correlations between ELA/L and Mathematics for Grade 6 


ELA/L RD WR MA 
ELA/L 77,913 77,913 77,913 
RD 0.96 77,913 77,913 
WR 0.87 0.74 77,913 
MA 0.80 0.81 0.66 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 


Table 14.25 Correlations between ELA/L and Mathematics for Grade 7 


ELA/L RD WR MA 
ELA/L 62,025 62,025 62,025 
RD 0.96 62,025 62,025 
WR 0.89 0.75 62,025 
MA 0.77 0.77 0.65 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 


Table 14.26 Correlations between ELA/L and Mathematics for Grade 8 


ELA/L RD WR MA 
ELA/L 39,091 39,091 39,091 
RD 0.95 39,091 39,091 
WR 0.88 0.72 39,091 
MA 0.70 0.69 0.60 


Note: ELA/L = English language arts/literacy, RD = Reading, WR = Writing, MA = Mathematics. 
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Table 14.27 Correlations between ELA/L and Mathematics for High School 
Mathematics 


ELA/L Al GO A2 M1 M2 M3 
8 0.69 0.58 0.46 
(23,140) (6,492) (116) 

9 0.67 0.76 0.62 0.66 

(2,365) (591) (118) (141) 
10 0.54 0.66 0.56 0.56 

(9,401) (4,665) (900) (104) 
11 


Note: ELA/L = English language arts/literacy, A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated 
Mathematics |, M2 = Integrated Mathematics Il, M3 = Integrated Mathematics III 


Table 14.28 Correlations between ELA/L Reading and Mathematics for High School 
Mathematics 


RD Al GO A2 M1 M2 M3 
8 0.69 0.57 0.49 
(23,140) (6,492) (116) 

9 0.65 0.73 0.59 0.68 

(2,365) (591) (118) (141) 
10 0.53 0.66 0.58 0.61 

(9,401) (4,665) (900) (104) 
ti 


Note: RD = Reading, A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = 
Integrated Mathematics II, M3 = Integrated Mathematics III 


Table 14.29 Correlations between ELA/L Writing and Mathematics for High School 
Mathematics 


WR Al GO A2 M1 M2 M3 
8 0.60 0.49 0.33 
(23,140) (6,492) (116) 

9 0.56 0.68 0.59 0.49 

(2,365) (591) (118) (141) 
10 0.45 0.56 0.43 0.32 

(9,401) (4,665) (900) (104) 
11 


Note: WR = Writing, A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated Mathematics |, M2 = 
Integrated Mathematics II, M3 = Integrated Mathematics III. 


14.5 Evidence from the Special Studies 


Several research studies were conducted to provide additional validity evidence for the participating state and 
agencies’ goals of assessing more rigorous academic expectations, helping to prepare students for college and 
careers, and providing information back to teachers and parents about their students’ progress toward college and 
career readiness. Some of the special studies conducted include: 
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e content alignment studies, 

e abenchmarking study, 

e =a longitudinal study of external validity, 
e amode comparability study, 

e adevice comparability study, and 


e Quality Testing Standards study. 


The following paragraphs briefly describe each of these studies. 


14.5.1 Content Alignment Studies 


In 2016, content of the ELA/L assessments at grades 5, 8, and 11 and the Algebra II and Integrated Mathematics II 
assessments were evaluated to determine how well the assessments were aligned to the Common Core State 
Standards (CCSS; Doorey, & Polikoff, 2016, Schultz et al., 2016). These content alignment studies were conducted 
by the Fordham Institute for grades 5 and 8 and by Human Resources Research Organization (HumMRRO) for the 
high school assessments. Both of these studies used the same methodology by having content experts review the 
assessment items and answers (for the constructed-response items the rubrics were reviewed). The content 
experts then judged how well the items aligned to the CCSS, the depth of knowledge of the items, and the 
accessibility of the items to all students, including English learners and students with disabilities. The authors of 
both studies noted that the content experts reviewing the assessments were required to be familiar with the CCSS 
but could not be employed by participating organizations or be the writers of the CCSS. Therefore, an effort was 
made to eliminate any potential conflicts of interest. 


The content studies had the individual content experts review and rate each item; then as a group the content 
experts came to a consensus on the final ratings for the content alignment, depth of knowledge, and accessibility 
to all students. In addition to the ratings, the content experts were asked to make comments that provided an 
explanation of their ratings; these comments were then used by the full group of content experts to provide 
narrative comments regarding the overall ratings and to provide feedback and recommendation about the 
assessment programs. 


The assessment program was rated as Excellent Match for ELA/L content and depth and Good Match for 
mathematics content and depth for grades 5 and 8. However, for grade 11 ELA/L content was rated as Excellent 
Match but depth was rated as Limited/Uneven Match. The high school mathematics assessments were rated at 
Excellent Match for content and Good Match for depth. 


The content studies noted some weaknesses and strengths of the assessments. For ELA/L, it was noted that the 
assessments include complex texts, a range of cognitive demands, and have a variety of item types. Furthermore, 
the ELA/L “assessments require close reading, assess writing to sources, research, and inquiry, and emphasize 
vocabulary and language skills” (Doorey & Polikoff, 2016). The grade 11 ELA/L assessment had a smaller range of 
depth and included items assessing the higher-demand cognitive level. A weakness of the ELA/L assessments is the 
lack of a listening and speaking component. It was also suggested that the ELA/L assessments could be enhanced 
by the inclusion of a research task that requires the use of two or more sources of information. 


The strengths of the mathematics assessments include assessments that are aligned to the major work for each 
grade level. While the grade 5 assessment includes a range of cognitive demand, the grade 8 assessment includes a 
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number of higher-demand items and may not fully assess the standards at the lowest level of cognitive demand. It 
was suggested that the grade 5 assessment could include more focus on the major work and the grade 8 
assessment could include items at the lowest cognitive demand level. Additionally, the reviewers noted that some 
of the mathematics items should be carefully reviewed for editorial and mathematical accuracy. 


The high school report noted that the assessment program incorporates a number of accessibility features and test 
accommodations for students with disabilities and for English learners. Furthermore, the assessments included 
items designed to accommodate the needs of students with disabilities. 


In 2017, HumRRO conducted a study to evaluate the quality and alignment of ELA/L and mathematics assessments 
for grades 3, 4, 6, and 7 (Schultz et al., 2017). This alignment study followed a similar methodology as the 2016 
study. Reviewers were asked to determine the extent to which items were aligned to the CCSS, using fully, 
partially, or not aligned as the rating categories. Ratings were averaged to determine overall alignment. For ELA/L, 
99.6 percent of grade 3 and 4 items, 95.5 percent of grade 6 items, and 94.6 percent of grade 7 items were fully 
aligned. For mathematics, 92.0 percent of grade 3, 91.1 percent of grade 4 items, 83.1 percent of grade 6 items, 
and 94.0 percent of grade 7 items were fully aligned. The majority of the items that did not fall into fully aligned 
were considered partially aligned to the standards. CCSS are designed to be measured by multiple items, so items 
that aligned to multiple CCSS received a partially aligned rating. The overall item-to-CCSS alignment was captured 
by a holistic alignment rating that indicated if an item captured the identified standards as a set. Holistic ratings 
(either yes or no) were found by averaging review ratings across clusters for items that included more than one 
standard. For ELA, for all four grades, at least 93 percent of items had a holistic alignment rating of yes to indicate 
that the identified standards captured the skills or knowledge required. For mathematics, grade 6 had the lowest 
percentage for the holistic alignment rating of yes (84.8 percent), and grade 7 had the highest (96.3 percent). 
Overall the alignment study suggests that the identified CCSS capture the knowledge and skills required in the 
items. 


In addition to the alignment study, HumMRRO also evaluated the CCSSO criteria for content and depth for ELA/L and 
mathematics grades 3, 4, 6, and 7, as well as the cognitive complexity levels of these same grades 7 (Schultz et al., 
2017). There are five criteria for ELA/L content: close reading, writing, vocabulary and language skills, research and 
inquiry, and speaking and listening. Reviewers were asked to rate the content as Excellent, Good, Limited/Uneven, 
or Weak Match. For grades 3, 4, 6, and 7, the ELA/L assessments received a composite rating of Excellent Match 
for assessing the content needed for college and career readiness. There are four criteria for ELA/L depth: text 
quality and types, complexity of texts, cognitive demand, and high-quality items and item variety. All grades in this 
study received a composite rating of Good Match for depth. For mathematics content, the composite rating is 
based on two criteria: focus and concepts, procedures, and applications. Grades 3, 4, and 6 received a composite 
content rating of Good Match, and grade 7 received a composite content rating of Excellent Match. The 
mathematics composite depth rating is based on three criteria: connecting practice to content, cognitive demand, 
and high-quality items and item variety. All grades in the study were rated as Excellent Match at assessing the 
depth needed to successfully meet college and career readiness. 


Finally, the 2017 HumRRO study looked at cognitive complexity of the items on ELA/L and mathematics at grades 
3, 4, 6, and 7 7 (Schultz et al., 2017). For the study, cognitive complexity was consistent with the current 
assessments’ definition. An item’s cognitive complexity is a measure of the rigor of an individual item based on the 
amount of text a student must process from the corresponding passage to answer the item correctly, the way in 
which students are expected to interact with the item’s functionality, and the linguistic demands and reading load 
that exists within the components of the item itself. Reviewers indicated their agreement with the intended 
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cognitive complexity ratings provided by participating states and agencies of low, medium, or high. The results 
indicated that the reviewers generally agreed with the distribution of complexity levels. There were differences in 
agreements in ELA/L language cluster and a few exceptions to agreement in math, particularly at grade 6, where 
there was disagreement in the ratings at the medium complexity level for two domains and the high complexity 
level for one domain. For grade 7, there was agreement across low, medium, and high in all domains. 


14.5.2 Benchmarking Study 


The purpose of the benchmarking study (McClarty et al., 2015) was to provide information that would inform the 
performance level setting (PLS) process. An evidence-based standard setting approach (EBSS; McClarty et al., 2013) 
was used to establish the performance levels for its assessments. In EBSS, the threshold scores for performance 
levels are set based on a combination of empirical research evidence and expert judgment. This benchmarking 
study provided one source of empirical evidence to inform the college- and career-readiness performance level 
(i.e., Level 4). The study findings were provided to a pre-policy standard-setting committee. The charge of this 
committee was to suggest a reasonable range for the percentage of students meeting or exceeding the Level 4 
threshold score and therefore considered college- and career-ready. Section 8.3.2 of this report provides more 
information about the pre-policy meeting. 


For the benchmarking study, external information was analyzed to provide information about the Level 4 threshold 
scores for the grade 11 ELA/L, Algebra II, and Integrated Mathematics III assessments, the grade 8 ELA/L and 
mathematics assessments, and the grade 4 ELA/L and mathematics assessments. The assessments and Level 4 
expectations were compared with comparable assessments and expectations for the Programme of International 
Student Assessment (PISA), Trends in International Mathematics and Science Study (TIMSS), Progress in 
International Reading Literacy Study (PIRLS), National Assessment of Educational Progress (NAEP), ACT, SAT, the 
Michigan Merit Exam, and the Virginia End-of-Course exams. For each external assessment, the best-matched 
performance level was determined and the percentage of students reaching that level across the nation and in the 
participating states and agencies was determined. Across all grades and subjects, the data indicated approximately 
25 to 50 percent of students were college- and career-ready or on track to readiness based on the Level 4 
expectations. 


For details on how the benchmarking study was used during the standard setting process, refer to Section 8 of this 
technical report. 


14.5.3 Longitudinal Study of External Validity of Performance Levels (Phase 1) 


In 2016-2017, the first phase of a two-part external validity study of claims about the alignment of Level 4 to 
college readiness was completed (Steedle et al., 2017) using the summative assessment scores from the 2014— 
2015 and 2015-2016 academic years. Associations between the performance levels and college-readiness 
benchmarks established by the College Board and ACT were used to study the claim that students who achieve 
Level 4 have a .75 probability of attaining at least a C in entry-level, credit-bearing, postsecondary coursework. 
Regression estimates measured the relationship between the summative assessment scores and external test 
scores. The Level 4 benchmark was used to estimate the expected score on an external test, and vice versa. 
Assessment scores were dichotomized for additional analyses. Cross-tabulation tables provided classification 
agreement among tests. Logistic regression modeled the relationship between students’ summative scores and 
their probabilities of meeting the external assessment benchmark, and vice versa. 
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These methods were used to make the following comparisons in mathematics: Algebra | and PSAT10 Math; 
Geometry and PSAT10 Math; Algebra II and PSAT10 Math; Algebra II and PSAT/NMSQT Math; Algebra II and SAT 
Math; and Algebra Il and ACT Math. The classification agreement (meeting the benchmark on both tests or not 
meeting the benchmark on both tests) ranged from 62.5 percent to 86.5 percent. The overall trend indicated that 
students who met the benchmark on a mathematics assessment were likely to meet or exceed the benchmark on 
an external test (probabilities ranged from .509 to .886). However, students who met the benchmark on the 
external test had relatively low probabilities of meeting the mathematics benchmark (.097 to .310). 


The following comparisons were made in ELA/L: grade 9 and PSAT10 evidence-based reading and writing (EBRW); 
grade 10 and PSAT10 EBRW;; grade 10 and PSAT/NMSQT EBRW;; grade 10 and SAT EBRW; grade 11 and 
PSAT/NMSQT EBRW; grade 11 and SAT EBRW; grade 11 and ACT English; and grade 11 and ACT reading. In the 
majority of comparisons, the trend in ELA/L results was similar to mathematics. The classification agreements 
ranged from 67.3 percent to 79.7 percent. Students meeting the ELA/L benchmark had probabilities between .667 
and .825 of meeting the benchmark on the external assessment. However, a student taking the external test had 
lower probabilities of meeting the benchmark on the ELA/L assessments (.326 to .513). 


Overall, results indicated that a student meeting the benchmark on the summative assessment had a high 
probability of making the benchmark on the external test, but the converse did not hold for students meeting the 
benchmark on the external test, for the majority of comparisons. These results suggest that meeting the 
summative benchmark is an indicator of academic readiness for college. However, it may be that students who 
meet the summative benchmark have a greater than .75 probability of earning a C or higher in first-year college 
courses. 


Phase 1 is a preliminary study using indirect comparisons; therefore, there are limitations to interpretations. Phase 
2 of this study was to occur in 2018 and use longitudinal data including academic performance in entry-level 
college courses for students who took the summative assessments during high school. Currently, this study is on 
hold due to challenges obtaining student academic data from entry-level college courses and/or matching the data 
to the student summative scores. 


14.5.4 Mode and Device Comparability Studies 


The summative assessments have been operational since the 2014-2015 school year. In addition to the traditional 
paper format, the assessments were available for online administration via a variety of electronic devices, 
including desktop computers, laptop computers, and tablets. The research agenda includes several studies 
evaluating the interchangeability of scale scores across modes and devices. 


This report describes a two-pronged study consisting of a mode comparability analysis and a device comparability 
analysis. In the mode comparability analysis, scores arising from the paper administration were compared to those 
arising from any type of online administration. In the device comparability analysis, online scores arising from tests 
administered using a tablet are compared with online scores arising from any other type of electronic 
administration where a tablet was not present (i.e., laptops, desktops, Chromebooks). 


The goal of this study was threefold: 1) to investigate whether assessment items were of similar difficulty across 
the levels of conditions for each analysis (i.e., paper and online for the mode comparability analysis and tablet and 
non-tablet for the device comparability analysis); 2) to determine whether the psychometric properties of test 
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scores were similar across the levels of conditions for each analysis; and 3) to determine whether overall test 
performance was similar across the levels of conditions for each analysis. 


This study examined performance on 12 assessments, split evenly between mathematics and ELA/L. Students were 
matched on demographic variables as well as the score from the summative assessment in the same content area 
in the prior year, creating comparable samples that allowed for an unbiased comparison of performance across 
different conditions. 


The results of the mode comparability analysis were mixed and found to be consistent with prior research. The 
item means suggested that items were of similar difficulty on paper and online modes. Only two items were 
flagged for mode effects, both of which were on the mathematics assessments. C-level differential item 
functioning (DIF) was present in both analyses. All the items flagged for C-level DIF in the mathematics 
assessments favored the online students, whereas the majority of items flagged for C-level DIF in the ELA/L 
assessments favored the paper students. An examination of test reliability displayed comparable reliability values 
between the two modes; none of the test forms were flagged for mode effects with respect to test reliability. The 
test-level adjustment analysis as well as the change of the paper students’ performance levels after the adjustment 
constants were applied to the paper students’ scores indicated that more scale scores were adjusted downward 
than were adjusted upward on the paper test form for each assessment except grades 5 and 7 mathematics. 
However, all adjustments were less than the minimum standard error of Theta except for grade 11 ELA/L, which 
was the same as the minimum standard error of Theta. Therefore, the adjustments are within measurement 
precision for each assessment. 


The results of the device comparability study revealed consistent evidence supporting the comparability between 
the tablet condition (TC) and the non-tablet condition (NTC). Specifically, the item means suggested that items 
were similarly difficult for the TC and NTC, and none of the items were flagged for device effects. The DIF analysis 
revealed that none of the items had C-level DIF. Consistent with the findings at the item level, an examination of 
test reliability indicated that the TC and NTC test forms were similarly reliable and that none of the test forms were 
flagged for device effects. Furthermore, the test-level adjustment analysis as well as the change of the students’ 
performance levels after the adjustment constants were applied did not indicate strong evidence of device effects. 


The generalizability of the findings from this study may be limited due to the small sample size of both the paper 
students (for mode comparability) and the tablet students (for device comparability) at the high-school grades; 
however, it appears that high-quality matching supports the internal validity of this study’s findings. For mode and 
device comparability, there were little to no items flagged for mode or device effects, the psychometric properties 
of test scores were similar across assessment conditions, and any adjustments to student performance for the 
paper or tablet condition were within measurement precision. 


14.6 Evidence Based on Response Processes 


As noted in the AERA, APA, and NCME Standards (2014), additional support for a particular score interpretation or 
use can be provided by theoretical and empirical evidence indicating that students are using the intended 
response processes when responding to the items in a test. This type of evidence may be gathered from 
interacting with students in order to understand what processes underlie their item responses. Evidence may also 
be derived from feedback provided by test proctors/teachers involved in the administration of the test and raters 
involved in the scoring of constructed-response items. Evidence may also be gathered by evaluating the correct 
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and incorrect responses to short constructed-response items (e.g., items requiring a few words to respond) or by 
evaluating the response patterns to multi-part items. 


New Meridian has undertaken research investigating the quality of the items, tasks, and stimuli, focusing on 
whether students interact with items/tasks as intended, whether they were given enough time to complete the 
assessments, and the degree to which scoring rubrics allow accurate and reliable scoring. In addition, the 
accessibility of the test for students with disabilities and English learners has been examined. This research has 
included examining students’ understanding of the format of the assessments and the use of technology. 


One such study conducted involved a series of four component studies that were conducted to evaluate the 
usability and effect of a drawing tool for online mathematics items. The purpose of these studies was to determine 
if results could support the use of the drawing tool, which is a way to expand students’ ability to demonstrate their 
understanding and reasoning, thereby enhancing accessibility and construct validity of the assessment. This goal is 
in keeping with guidance from the Common Core State Standards (CCSS) and the National Council of Teachers of 
Mathematics (NCTM) that students should have multiple paths and tools available to express their responses. 
Additionally, the drawing tool was intended to boost comparability across modes. 


The first two studies (Brandt, Bercovitz, McNally, & Zimmerman, 2015; Brandt, Bercovitz, & Zimmerman, 2015) 
focused on evaluating the usability of the tool itself both in the general population and among students with low- 
vision and fine motor impairment disabilities. During these studies, detailed information regarding the 
functionality of the tool was collected and it was determined that the items should be tested operationally. 


The third and fourth studies (Steedle & LaSalle, 2016; Minchen et al., 2018) involved evaluating the effect of the 
tool in the context of the operational assessments. The third study was conducted in grade 3 and the fourth study 
was conducted in grades 4 and 5. To evaluate the drawing tool in context, a set of items were studied by field 
testing them with and without the drawing tool. The drawing tool version of each item was randomly assigned to 
students so that comparisons could be made. The goal was to explore the impact of the drawing tool on item 
performance. In general, the results showed that the drawing tool usually did not have a significant impact on 
performance or item statistics. Items with access to the drawing tool, however, did show longer response times for 
grades 4 and 5, prompting a limitation to be placed on the number of drawing tool items in each unit. 


Several other research efforts have investigated questions relevant to response processes evidence. Descriptions 
of the research conducted can be found online. 7° 


14.7 Interpretations of Test Scores 


The summative assessment scores are expressed as scale scores (both total scores and claim scores), along with 
performance levels to describe how well students met the academic standards for their grade level. Additionally, 
information on specific skills (the subclaims) is also provided and is reported as Below Expectations, Nearly Meets 
Expectations, and Meets or Exceeds Expectations. On the basis of a student’s total score, an inference is drawn 
about how much knowledge and skill in the content area the student has acquired. The total score is also used to 
classify students in terms of their level of knowledge and skill in the content area as students progress in their K-12 
education. These levels are called performance levels and are reported as: 


e = Level 5: Exceeded expectations 


20 Various research is described at: http://resources.newmeridiancorp.org/ 
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e = Level 4: Met expectations 
e Level 3: Approached expectations 
e = Level 2: Partially met expectations 


e = Level 1: Did not yet meet expectations 


Students classified as either Level 4 or Level 5 are meeting or exceeding the grade level expectations. Performance 
level descriptors (PLDs) assist with the understanding and interpretations of the ELA/L scores 
(https://resources.newmeridiancorp.org/ela-test-design/) and mathematics scores 
(https://resources.newmeridiancorp.org/math-test-design/). Additionally, resource information is available online 
to educators, parents, and students (http://resources.newmeridiancorp.org/). Section 12 of this technical report 
provides more information on the scale scores and the subclaim scores. 


14.8 Evidence Based on the Consequences to Testing 


The consequence of testing should also be investigated to support the validity evidence for the use of the 
summative assessments as the standards note that tests are usually administered “with the expectation that some 
benefit will be realized from the intended use of the scores” (AERA, APA, & NCME, 2014). When this is the case, 
evidence that the expected benefits accrue will provide support for the intended use of the scores. Evidence of the 
consequence of testing will also accrue with the continued implementation of the CCSS and the continued 
administration of the assessments. 


Consequences of the tests may vary by state or by school district. For example, some states may require “passing” 
the assessments as one of several criteria for high school graduation, while other states/districts may not require 
students to “pass” the assessments for high school graduation. Additionally, some school districts may use the 
scores along with other information such as school grades and teacher recommendations for placing students into 
special programs (e.g., remedial support, gifted and talented program) or for course placement (e.g., Algebra | in 
grade 8). Because the consequences for the assessments can vary by each state, it is suggested that each member 
state provide school districts, teachers, parents, and students with information on how to interpret and use the 
scores. Additionally, the states should monitor how scores are used to ensure that the scores are being used as 
intended. 


14.9 Summary 


In this section of the technical report, several aspects of validity were included, such as validity evidence based on 
content, the internal structure of the assessments, relationships across the content assessments, and evidence 
from special studies. 


The item development process involved educators, assessment experts, and bias and sensitivity experts in review 
of text, items, and tasks for accuracy, appropriateness, and freedom from bias. Several studies were conducted 
during the item development process to evaluate the item development process (e.g., technological 
functionalities, answer time required, and student experiences). Additionally, items were field tested prior to the 
initial operational administration, and data and feedback from students, test administrators, and classroom 
teachers was used to improve the operational administration of the items and to inform future item development. 
The multiple item and form reviews conducted by educators and studies to evaluate item administration help to 
ensure the integrity of the assessments. 
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The intercorrelations of the subclaims, the reliability analyses, and the local item dependence analyses indicated 
that the ELA/L and the mathematics assessments are both essentially unidimensional. Furthermore, the 
correlations between ELA/L and mathematics indicated that the two assessments are measuring different content. 


Several studies were conducted as part of the assessment program (e.g., benchmarking study, content 
evaluation/alignment studies, longitudinal study, and mode and device comparability studies). The benchmarking 
study was conducted in support of the standard setting meeting. This study indicated students performing at or 
above Level 4 could be considered to be college- and career-ready or on track to readiness. 


The content evaluation/alignment studies performed by the Fordham Institute and HumRRO indicate that the 
assessments are good to excellent matches to the CCSS in terms of content and depth of knowledge. Thus, the 
assessments are assessing the college- and career-readiness standards. However, the reports noted that the 
program could improve by adding a wider range of depth of knowledge to some of the assessments. The reports 
also suggested enhancing the ELA/L assessments by including a research task that requires the use of two or more 
sources of information. 


In the longitudinal study of external validity, associations between the performance levels and college-readiness 
benchmarks established by the College Board and ACT were used to study the claim that students who achieve 
Level 4 have a .75 probability of attaining at least a Cin entry-level, credit-bearing, postsecondary coursework. In 
the first phase of the study, the relationship between the summative assessment and external tests was studied. 
Overall, results indicated that a student meeting the benchmark on the summative assessment had a high 
probability of making the benchmark on the external test, but the converse did not hold for students meeting the 
benchmark on the external test, for the majority of comparisons. These results suggest that meeting the 
benchmark is an indicator of academic readiness for college. In the next phase of the study, the relationship 
between scores and performance in first-year college courses will be explored. 


The mode comparability study indicated that the comparability across modes was inconsistent across content 
domains and grade levels. The results of the mode comparability analysis were mixed and found to be consistent 
with prior research. The results of the device comparability study revealed consistent evidence supporting the 
comparability between the tablet condition (TC) and the non-tablet condition (NTC). In both the mode and device 
comparability studies, there were little to no items flagged for mode or device effects, the psychometric properties 
of test scores were similar across assessment conditions, and any adjustments to student performance for the 
paper or tablet condition were within measurement precision. 


In addition to the validity information presented in this section of the technical report, other information in 
support of the uses and interpretations of the scores appear in the following sections: 


e Section 5 provides information concerning the test characteristics based on classical test theory. 
e Section 6 provides information regarding the differential item functioning (DIF) analyses. 


e Section 11 presents information regarding student characteristics for the spring administration of the 
ELA/L and mathematics administration. 


e Section 12 provides detailed information concerning the scores that were reported and the cut scores for 
ELA/L and mathematics. 


e Section 13 provides information on the test reliability (total test score and for subclaims) and includes 
information on the interrater reliability/agreement. 
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The technical report addendum provides the student characteristics and test reliability (total test score and for 
subclaims) for the 2018 fall block administration. 


New Meridian February 28, 2020 Page 162 


2019 Technical Report 


Section 15: Student Growth Measures 


Student growth percentiles (SGPs) are normative measures of annual progress. Normative measures are useful in 
answering questions like “How does my academic progress compare with the academic progress of my peers?” In 
contrast to criterion-referenced measures of growth, which describe academic growth toward a particular goal, 
norm-referenced measures of growth describe students’ growth relative to that of students who performed 
similarly in the past (Betebenner, 2009). 


SGPs measure individual student progress by tracking student scores from one year to the next. SGPs compare a 
student’s performance to that of his or her academic peers both within the state and across the consortium. 
Academic peers are defined as students in the norm group who took the same assessment as the student in prior 
years and achieved a similar score. 


Some participating states or agencies chose to implement norm groups based on their respective student data. 
State-specific SGP results are not reported in this Technical Report. As a result, SGPs were only summarized for 
states using norm groups based on the consortium. The following sections describe the norm groups, the 
estimation procedure, and the results for SGPs based on consortium norm groups. 


The SGP describes a student’s location in the distribution of current test scores for all students who performed 
similarly in the past. SGPs indicate the percentage of academic peers above whom the student scored. With a 
range of 1 to 99, higher numbers represent higher growth and lower numbers represent lower growth. For 
example, a SGP of 60 on grade 7 ELA/L means that the student scored better than 60 percent of the students in the 
state or consortium who took grade 7 ELA/L in spring 2018 and who had achieved a similar score as this student on 
the grade 6 ELA/L assessment in spring 2017 and the grade 5 ELA/L assessment in spring 2016.21 A SGP of 50 
represents typical (median) student growth for the state or consortium. Because students are only compared with 
other students who performed similarly in the past, all students, regardless of starting point, can demonstrate high 
or low growth. 


The 2018-2019 academic year is the fifth year of test administration. Students in states that participated in spring 
2017 and spring 2018 generally received SGPs based on two prior scores. Students in states that participated in 
spring 2018 received SGPs based on one prior score. Students who do not have a previous test score, which 
include any new students and all grade 3 students, do not receive an SGP. 


15.1 Norm Groups 


The norm groups consisted of students with the same prior scores based on grade or content area progressions 
(academic peers). SGPs were based on up to two years of prior test scores from spring 2017 and spring 2018 
administrations. States administering traditional mathematics assessments in fall 2017 or fall 2018 may also have 
SGPs based on these prior scores. Tables 15.1—15.8 list the grade or content area progressions required for SGPs 
based on one prior or two prior test scores for ELA/L grades 3 through 11, mathematics grades 3 through 8, 
Algebra |, Geometry, Algebra Il, Integrated Mathematics |, Il, and Ill, respectively. In general, the progressions of 


21 Note: Because regression modeling is used to establish the relationship between prior and current scores, the SGP is for 
students with the exact same prior scores. This often leads to confusion among non-technical stakeholders who often ask, 
“How many students are there with exactly the same prior scores?” To avoid explaining regression to non-technical 
stakeholders, the “similar scores” is often used to finesse the idea of regression without mentioning it. 
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grade levels and content areas are consecutive. The traditional and integrated mathematics courses have 
progressions that are not consecutive but reflect student progression for high school mathematics courses. SGPs 
were calculated for all norm groups with at least 1,000 students. Some progressions did not meet the minimum 
sample size for SGP calculations. 


Table 15.1 ELA/L Grade-Level Progressions for One- and Two-year Prior Test Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
N/A N/A Grade 3* 

N/A Grade 3 Grade 4 

Grades 3 and 4 Grade 4 Grade 5 

Grades 4 and 5 Grade 5 Grade 6 

Grades 5 and 6 Grade 6 Grade 7 

Grades 6 and 7 Grade 7 Grade 8 

Grades 7 and 8 Grade 8 Grade 9 

Grades 8 and 9 Grade 9 Grade 10 

Grades 9 and 10 Grade 10 Grade 11 


*SGP not calculated for grade 3 since there are no prior scores. 


Table 15.2 Mathematics Grade-Level Progressions for One- and Two-year Prior Test Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
N/A N/A Grade 3* 

N/A Grade 3 Grade 4 

Grades 3 and 4 Grade 4 Grade 5 

Grades 4 and 5 Grade 5 Grade 6 

Grades 5 and 6 Grade 6 Grade 7 

Grades 6 and 7 Grade 7 Grade 8 


*SGP not calculated for grade 3 since there are no prior scores. 
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Table 15.3 Algebra I Grade/Content Area Progressions for One- and Two-year Prior Test Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
Grades 5 and 6 Grade 6 Algebra | 

Grades 6 and 7 Grade 7 Algebra | 

Grades 6 or 7 and 8 Grade 8 Algebra | 

Grades 6, 7, or 8 and Geometry Geometry Algebra | 

Grade 8 and Integrated Mathematics | Algebra | 

Integrated Mathematics | 

Integrated Mathematics | and Integrated Mathematics II Algebra | 


Integrated Mathematics II 


Table 15.4 Geometry Grade/Content Area Progressions for One- and Two-year Prior Test Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
Grades 5 and 6 Grade 6 Geometry 

Grades 6 and 7 Grade 7 Geometry 

Grades 6 or 7 and 8 Grade 8 Geometry 

Grades 6, 7, or 8 and Algebra | Algebra | Geometry 

Grade 8 and Integrated Mathematics | Geometry 

Integrated Mathematics | 

Integrated Mathematics | and Integrated Mathematics II Geometry 


Integrated Mathematics II 


Table 15.5 Algebra II Grade/Content Area Progressions for One- and Two-year Prior Test Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
Grades 6 and 7 Grade 7 Algebra II 

Grades 7 and 8 Grade 8 Algebra Il 

Grades 7 or 8 and Algebra | Algebra | Algebra II 

Grade 8 or Algebra | and Geometry Geometry Algebra II 

Grade 8 and Integrated Mathematics | Algebra II 

Integrated Mathematics | 

Integrated Mathematics | and Integrated Mathematics II Algebra II 


Integrated Mathematics II 
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Table 15.6 Integrated Mathematics I Grade/Content Area Progressions for One- and Two-year Prior Test 
Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 

Grades 5 and 6 Grade 6 Integrated Mathematics | 
Grades 6 and 7 Grade 7 Integrated Mathematics | 
Grades 6 or 7 and 8 Grade 8 Integrated Mathematics | 
Grades 7 or 8 and Algebra | Algebra | Integrated Mathematics | 
Grade 8 or Algebra | and Geometry Geometry Integrated Mathematics | 


Table 15.7 Integrated Mathematics II Grade/Content Area Progressions for One- and Two-year Prior Test 
Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 

Grades 6 and 7 Grade 7 Integrated Mathematics II 
Grades 7 and 8 Grade 8 Integrated Mathematics II 
Grades 7 or 8 and Integrated Algebra | Integrated Mathematics II 


Mathematics | 


Table 15.8 Integrated Mathematics III] Grade/Content Area Progressions for One- and Two-year Prior Test 
Scores 


Two Prior Year Test Scores One Prior Year Test Score Current Year Test Score 
Grades 6 and 7 Grade 7 Integrated Mathematics III 
Grades 7 and 8 Grade 8 Integrated Mathematics III 
Grades 7 or 8 and Integrated Algebra | Integrated Mathematics III 
Mathematics | 

Integrated Mathematics | and Integrated Mathematics II Integrated Mathematics III 


Integrated Mathematics II 


In addition to the above progressions, in 2018 the State Leads approved a state-specific SGP progression for one 
state. In this state, grade 9 students are not required to take the test. Therefore, grade 10 students were not 
receiving a SGP. For this state, both mathematics and ELA/L progressions were adjusted (see Table 15.9) such that 
the grade 10 students would receive growth estimates. Other states were not affected by this change. 


Table 15.9 State-specific SGP Progressions 


Two Prior Test Scores One Prior Test Score Current Test Score 
ELA/L Grades 7 and 8 ELA/L Grade 8 ELA/L Grade 10 
Mathematics Grade 7 and 8 Mathematics Grade 8 Geometry 
Mathematics Grade 7 and Algebra | Algebra | Geometry 


15.2 Student Growth Percentile Estimation 


SGPs are calculated using quantile regression, which describes the conditional distribution of the response variable 
with greater precision than traditional linear regression, which describes only the conditional mean (Betebenner, 
2009). This application of quantile regression uses B-spline smoothing to fit a curvilinear relationship between a 
norm group’s prior and current scores. Cubic B-spline basis functions are used when calculating SGPs to better 
model the heteroscedasticity, nonlinearity, and skewness in assessment data. 
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For each group, the quantile regression fits 100 relationships (one for each percentile) between students’ prior and 
current scores. The result is a single coefficient matrix that relates students’ prior achievement to their current 
achievement at each percentile. The National Center for the Improvement of Educational Assessment (NCIEA) 
performed the analyses using Betebenner’s (2009) non-linear quantile-regression based SGP. The analysis was 
done in the SGP package in R (Betebenner et al., 2017). For details on student growth percentiles, see 
Betebenner’s A Technical Overview of the Student Growth Percentile Methodology: Student Growth Percentiles and 
Percentile Growth Projections/Trajectories (2011). 


Betebenner’s (2009) SGP model uses Koenker’s (2005) quantile regression approach to estimate the conditional 
density associated with a student’s score at administration t conditioned on the student’s prior score(s). Quantile 
regression functions represent the solution to a loss function much like least squares regression represents the 
solution to a minimization of squared deviations. The conditional quantile functions are parametrized as a linear 
combination of B-spline basis functions (Wei & He, 2006) to smooth irregularities found in the data. For scores 
from administration t (where t 22), the tth quantile function for y conditional on prior scores ( Y,;,...,%) is 


On(t|Ya,...,K)= Pe (Y) Bin (7) (15-1) 


where g_ (j=1,2,..., n students; ,,=1, ..., £—1 administrations) represent the B-spline basis functions. The SGP of 
%,, J 


each student 7 is the midpoint between the two consecutive ; whose quantile scores capture the student’s 
current score, multiplied by 100. For example, a student with a current score that lies between the fitted value for 
T =.595 and tT =.605 would receive a SGP of 60. 


SGPs are assumed to be uniformly distributed and uncorrelated with prior achievement. Scale score conditional 
standard errors of measurement (CSEMs) were incorporated for calculation of SGP standard errors of 
measurement (SEMs). Goodness of fit results were checked (i.e., uniform distribution of SGPs by prior 
achievement) for indications of ceiling/floor effects for each SGP norm-group analysis. 


15.3 Student Growth Percentile Results/Model Fit for Total Group 


The estimation of SGPs was conducted for each student who had at least one prior score. Each analysis is defined 
by the norm cohort group (grade/sequence). A goodness of fit plot is produced for each analysis run. A 
ceiling/floor effects test identifies potential problems at the highest obtainable scale scores (HOSS) and lowest 
obtainable scale scores (LOSS). Other fit plots compare the observed conditional density of SGP estimates with the 
theoretical uniform density. If there is perfect model fit, 10 percent of the estimated growth percentiles are 
expected within each decile band. A Q-Q plot compares the observed distribution with the theoretical distribution; 
ideally the step function lines do not deviate much from the ideal line of perfect fit. 


Tables 15.10 and 15.11 summarize SGP estimates for the total testing group for ELA/L and mathematics, 
respectively. SGPs were calculated at the consortium level and, if sample size was sufficient, the state level. 
Median SGPs ranged from 39 to 60, with most having a median of approximately 50. If the model is a perfect fit, 
the median is expected to be 50 with norm-referenced data. The minimum SGP is 1 and the maximum SGP is 99. 
The average standard error for the SGPs is within expectations for these models. 
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In general, SGPs can be divided into three categories: below 30 indicating that a student is not meeting a year’s 
worth of growth, a SGP of 30—70 indicating that a student did achieve a year’s worth of growth, and a SGP over 70 
indicating that the student surpassed a year’s worth of growth. It is important to note that definitions such as 
these are not inherent to the SGP method, but rather require expert judgment (Betenbenner, 2009). The observed 
standard errors, ranging from 10.48-16.47, support these interpretations (Betenbenner et al., 2016). 


Table 15.10 Summary of ELA/L SGP Estimates for Total Group 


Grade Sample Average Average Standard Median 
Level Size SGP Error SGP 
4 5,800 54.10 12.78 55 
5 5,623 52.00 13.42 53 
6 5,224 53.46 13.29 55 
7 4,599 57.74 12.74 61 
8 4,333 51.64 13.61 53 
9 = ae oe ee 
10 3,197 43.21 10.48 40 
11 - -- -- -- 


Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Table 15.11 Summary of Mathematics SGP Estimates for Total Group 


Grade/Course Sample Average Average Standard Median 
Level Size SGP Error SGP 
4 5,751 51.11 12.64 51 
5 5,588 51.08 13.45 51 
6 5,203 45.15 14.49 43 
7 4,470 51.15 15.27 51 
8 3,446 46.89 16.47 46 
Al 843 42.70 13.71 39 
GO 2,372 53.71 15.14 53 
A2 -- -- -- - 
M1 - -- -- - 
M2 - -- -- - 
M3 - - -- -- 


Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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15.4 Student Growth Percentile Results for Subgroups of Interest 


Median SGPs are provided for subgroups of interest. With norm-referenced data, the median of all SGPs is 
expected to be close to 50. Median subgroup growth percentiles below 50 represent growth lower than the 
median, and median growth percentiles above 50 represent growth higher than the median. Table 15.12 
summarizes SGPs for groups of interest for ELA/L grade 4. The ELA/L tables for grades 5-8 and 10 are provided in 
the Appendix (Tables A.15.1 — A.15.6). Table 15.13 summarizes SGPs for groups of interest for mathematics grade 
4; the other mathematics subgroup results are provided in the Appendix (Tables A.15.7 — A.15.13). Median SGPs 
for subgroups of interest fell within the band of 30—70, which is considered to be adequate growth. ELA/L grades 9 
and 11, Algebra II, and Integrated Mathematics had insufficient sample size for SGP subgroup results to be 
reported. 


15.4.1 SGP Results for Gender 


English Language Arts/Literacy 
The median SGPs for females tend to be higher than the median SGPs for males. The median SGP for females 


ranges from 40 to 64, whereas the median SGP for males ranges from 40 to 57. The standard error for males and 
females is comparable to the total group. 


Mathematics 
There was no consistent pattern between median SGPs for females and males. The median SGP for females ranges 


from 40 to 54, and the median SGP for males ranges from 37 to 53. The standard errors for both are similar to the 
total group. 


15.4.2 SGP Results for Ethnicity 


English Language Arts/Literacy 
The African American group median SGP ranges from 38 to 59, with students in higher grades at the higher range. 


Asian/Pacific Islanders tend to have the highest median SGPs, over 60 for all tests but grade 10. American 
Indian/Alaska Native have insufficient sample size for the majority of the grade levels. The median SGP for 
Hispanics ranges from 41 to 63. For all ethnicity groups, standard errors are similar to that of the total group. 


Mathematics 
The median SGP for African Americans ranges from 34 to 51, with the highest growth in mathematics grade 7. 


Asian/Pacific Islanders tend to have the highest SGPs across all tests, with a minimum of 49 and a maximum of 
76.5. American Indian/Alaska Native have insufficient sample size for the majority of the grade levels. The median 
SGP for Hispanics ranges from 33 to 65. For all ethnicities, the standard errors for all groups are under 19 points. 


15.4.3 SGP Results for Special Instructional Needs 


English Language Arts/Literacy 
Economically disadvantaged and English language learner students tend to have lower median SGPs than the 


general population. The median SGP ranges from 38 to 59 for economically disadvantaged students and from 47 to 
60 for English language learners. Students with disabilities observed median SGP of 40 to 52. The standard errors 
for special instructional needs subgroups are similar to those observed for the total group. 
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Mathematics 
Economically disadvantaged and English language learner students tend to have lower median SGPs than the 


general population. The median SGP ranges from 33 to 49 for economically disadvantaged students and from 39 to 
65 for English language learners. Students with disabilities median SGP ranges from 40 to 57, whereas for students 
without disabilities the median SGP ranges from 38 to 53. The standard errors for special education students are 
similar to the total group. 


Table 15.12 Summary of SGP Estimates for Subgroups: Grade 4 ELA/L 


Average 
Total Sample Average Standard 
Size SGP Error Median SGP 
Gender : : : ‘ 
Male 2,974 52.06 12.90 53 
Female 2,826 56.25 12.65 59 
Ethnicity 
White 720 67.94 11.71 74 
African American 3,853 49.32 13.10 49 
Asian/Pacific Islander 86 67.02 11.63 74 
American Indian/Alaska Native = a - -- 
Hispanic 994 59.56 12.58 63 
Multiple - -- -- -- 
Special Instructional Needs : : , : 
Economically Disadvantaged 4,470 51.21 13.00 52 
Not-economically Disadvantaged 1,330 63.81 12.05 69 
English Learner 760 57.89 12.70 60 
Non English Learner 5,040 53.53 12.79 54 
Students with Disabilities 1,241 45.72 13.63 43 
Students without Disabilities 4,559 56.38 12.55 58 


15.4.4 SGP Results for Students Taking Spanish Forms 


Mathematics 
There is a wide range of median growth percentiles for students taking Spanish forms. The sample size is less than 


50 for all grade levels. These forms had a slightly higher standard error on average, likely due to lower sample 
sizes. 
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Table 15.13 Summary of SGP Estimates for Subgroups: Grade 4 Mathematics 


Average 
Total Average Standard Median 
Sample Size SGP Error SGP 
Gender : ‘ : : 
Male 2,952 50.92 12.70 50 
Female 2,799 51.32 12.58 52 
Ethnicity 
White 719 61.02 13.18 65 
Black/African American 3,844 47.99 12.53 47 
Asian/Pacific Islander 86 64.55 13.00 75.5 
American Indian/Alaska Native -- - = -- 
Hispanic 955 54.09 12.57 56 
Multiple oa = -- -- 
Special Instructional Needs : . : : 
Economically Disadvantaged 4,424 49.54 12.55 49 
Not-economically Disadvantaged 1,327 56.38 12.97 61 
English Learner 722 54.87 12.62 56.5 
Non English Learner 5,029 50.58 12.65 50 
Students with Disabilities 1,231 46.48 13.28 46 
Students without Disabilities 4,520 52.38 12.47 53 
Spanish Language Form 40 41.58 12.47 40.5 
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Appendix 6: Summary of Differential Item Function (DIF) Results 


Table A.6.1 Pre-Administration Differential Item Functioning for ELA/L Grade 3 


DIF Comparison 


Male vs Female 
White vs Black 
White vs Hispanic 
White vs Asian 
White vs Amerindian 
White vs Pacific Islander 
White vs Multiracial 
NoEcnDis vs EcnDis 
ELN vs ELY 


SWDN vs SWDY 


Total N 
of 
Unique 
Items 


52 
52 
52 
52 
52 
52 
52 
52 
52 


52 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
% of % of % of % of % of 
Total Total Total Total Total 
2 50 96 2 
1 2 51 98 
3 6 49 94 
2 51 98 
52 100 
1 2 50 96 2 
52 100 
52 100 
2 4 50 96 
52 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.2 Pre-Administration Differential Item Functioning for ELA/L Grade 4 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 68 ; ‘ 5 7 62 91 1 4 
White vs Black 68 . . 1 1 67 99 
White vs Hispanic 68 . i 1 1 67 99 
White vs Asian 68 1 1 2 3 65 96 
White vs AmerIndian 68 : . 2 3 66 97 
White vs Pacific Islander 68 . : 1 1 67 99 
White vs Multiracial 68 1 1 . : 67 99 
NoEcnDis vs EcnDis 68 i ; : : 68 100 
ELN vs ELY 68 f : 1 1 67 99 
SWDN vs SWDY 68 s . 3 4 65 96 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.3 Pre-Administration Differential Item Functioning for ELA/L Grade 5 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
9 9 9 9 9 
BIF Pomparison idea N Ee sa oa a ca 
Items 
Male vs Female 61 2 3 2 3 56 92 1 2 
White vs Black 61 1 2 5 8 55 90 
White vs Hispanic 61 1 2 3 5 57 93 
White vs Asian 61 : é 1 2 59 97 1 2 
White vs Amerindian 61 ; . : : 61 100 
White vs Pacific Islander 61 1 2 1 2 58 95 1 2 
White vs Multiracial 61 . : : : 61 100 
NoEcnDis vs EcnDis 61 g : 4 7 57 93 
ELN vs ELY 61 5 8 3 5 53 87 
SWDN vs SWDY 61 1 2 3 5 57 93 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.4 Pre-Administration Differential Item Functioning for ELA/L Grade 6 


C- DIF B- DIF ADIF B+ DIF C+ DIF 
Total N 
9 " 5 9 ‘ 
DIF Combearsen aoe nl a ao the zh ee 
Items 
Male vs Female 71 1 1 8 11 61 86 1 1 
White vs Black 71 1 1 3 4 66 93 1 1 
White vs Hispanic 71 1 1 3 4 67 94 
White vs Asian 71 ; 1 1 69 97 1 1 
White vs Amerindian 71 6 8 64 90 1 1 
White vs Pacific Islander 71 ; 1 1 70 99 
White vs Multiracial 71 ; 71 100 
NoEcnDis vs EcnDis 71 ; 71 100 
ELN vs ELY 71 1 1 2 3 68 96 
SWDN vs SWDY 71 71 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.5 Pre-Administration Differential Item Functioning for ELA/L Grade 7 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 57 1 2 5 9 51 89 
White vs Black 57 . : 2 4 55 96 
White vs Hispanic 57 1 2 3 5 53 93 
White vs Asian 57 i : i : 56 98 1 2 
White vs AmerIndian 57 3 5 - ‘ 53 93 1 2 
White vs Pacific Islander 57 ‘ : 1 2 55 96 1 2 
White vs Multiracial 57 . ; : : 57 100 
NoEcnDis vs EcnDis 57 : : 1 2 56 98 
ELN vs ELY 57 2 4 6 11 48 84 1 2 
SWDN vs SWDY 57 : : : : 57 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.6 Pre-Administration Differential Item Functioning for ELA/L Grade 8 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 67 ; ‘ 3 4 62 93 2 3 
White vs Black 67 . . 2 3 65 97 
White vs Hispanic 67 . é 4 6 63 94 
White vs Asian 67 é , : : 65 97 2 3 
White vs AmerIndian 67 , : 5 7 62 93 
White vs Pacific Islander 67 . : ‘ ; 67 100 
White vs Multiracial 67 ; : : ; 67 100 
NoEcnDis vs EcnDis 67 ; : i ; 67 100 
ELN vs ELY 67 2 3 6 9 59 88 
SWDN vs SWDY 67 i . : ; 67 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.7 Pre-Administration Differential Item Functioning for ELA/L Grade 9 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
9 9 9 9 9 
BIF Pomparison idea N Ee sa ms oo ca 
Items 
Male vs Female 81 2 2 8 10 68 84 3 4 
White vs Black 81 . : 4 5 77 95 
White vs Hispanic 81 : : 3 4 78 96 
White vs Asian 81 1 1 i : 79 98 1 1 
White vs Amerindian 81 2 2 3 4 76 94 
White vs Pacific Islander 81 . : 3 4 78 96 
White vs Multiracial 81 : : : : 81 100 
NoEcnDis vs EcnDis 81 : : 1 1 80 99 
ELN vs ELY 81 1 1 8 10 71 88 1 1 
SWDN vs SWDY 81 ‘ : 1 1 80 99 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.8 Pre-administration Differential Item Functioning for ELA/L Grade 10 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 58 2 3 5 9 50 86 1 2 
White vs Black 58 é ‘ , : 57 98 1 2 
White vs Hispanic 58 i : 1 2 57 98 
White vs Asian 58 1 2 : : 57 98 
White vs Amerindian 58 1 2 1 2 55 95 1 2 
White vs Pacific Islander 58 1 2 ‘ : 57 98 
White vs Multiracial 58 : ‘ : : 58 100 
NoEcnDis vs EcnDis 58 1 2 ' : 57 98 
ELN vs ELY 58 1 2 3 5 54 93 
SWDN vs SWDY 58 F : : ; 58 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.9 Pre-Administration Differential Item Functioning for ELA/L Grade 11 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
9 9 9 9 9 
PIF Gompargon i N hase ne bane N cars na 
Items 
Male vs Female 55 1 2 4 7 49 89 1 2 
White vs Black 55 : F 3 5 52 95 
White vs Hispanic 55 1 2 2 4 50 91 2 4 
White vs Asian 55 é ‘ : F 53 96 2 4 
White vs Amerlndian 55 3 5 1 2 50 91 1 2 
White vs Pacific Islander 55 ; : ' : 55 100 
White vs Multiracial 55 : : : : 55 100 
NoEcnDis vs EcnDis 55 1 2 : : 54 98 
ELN vs ELY 55 3 5 4 7 48 87 
SWDN vs SWDY 55 : i : : 55 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.10 Post-Administration Differential Item Functioning for ELA/L Grade 3 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 52 1 2 é , 51 98 
White vs Black 52 : . 1 2 51 98 
White vs Hispanic 52 : : 1 2 51 98 
White vs Asian 52 : : : i 52 100 
White vs Amerindian 52 : ; 1 2 51 98 
White vs Pacific Islander 52 1 2 3 6 47 90 1 2 
White vs Multiracial 52 : : : : 52 100 
NoEcnDis vs EcnDis 52 ' : . : 52 100 
ELN vs ELY 52 ; : 3 6 49 94 
SWDN vs SWDY 52 : . 1 2 51 98 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.11 Post-Administration Differential Item Functioning for ELA/L Grade 4 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 68 : : 3 4 62 91 3 4 
White vs Black 68 : : 1 1 67 99 
White vs Hispanic 68 . : 2 3 66 97 
White vs Asian 68 1 1 : : 67 99 
White vs AmerIndian 68 . : 1 1 67 99 
White vs Pacific Islander 68 : ; 2 3 65 96 1 1 
White vs Multiracial 68 : : : i 68 100 
NoEcnDis vs EcnDis 68 ; : : : 68 100 
ELN vs ELY 68 : : : ; 68 100 
SWDN vs SWDY 68 : : : : 68 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.12 Post-Administration Differential Item Functioning for ELA/L Grade 5 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 61 : : 4 7 53 87 4 7 
White vs Black 61 : : 3 5 58 95 
White vs Hispanic 61 1 2 : : 60 98 
White vs Asian 61 : , : ; 60 98 1 2 
White vs AmerIndian 61 1 2 3 5 57 93 
White vs Pacific Islander 61 : : 1 2 60 98 
White vs Multiracial 61 : : : ‘ 61 100 
NoEcnDis vs EcnDis 61 ' : . ? 61 100 
ELN vs ELY 61 2 3 5 8 54 89 
SWDN vs SWDY 61 1 2 : i 60 98 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.13 Post-Administration Differential Item Functioning for ELA/L Grade 6 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison == nique = Ntotat total No otat Nota Total 
Items 
Male vs Female 71 1 1 i 10 63 89 
White vs Black 71 1 1 1 1 69 97 
White vs Hispanic 71 1 1 2 3 68 96 
White vs Asian 71 : : 1 1 70 99 
White vs AmerIndian 71 2 3 7 10 61 86 1 1 
White vs Pacific Islander 71 1 1 2 3 68 96 
White vs Multiracial 71 3 : : ‘ 71 100 
NoEcnDis vs EcnDis 71 ; : 3 . 71 100 
ELN vs ELY 71 1 1 10 14 60 85 
SWDN vs SWDY 71 : : 1 1 70 99 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.14 Post-Administration Differential Item Functioning for ELA/L Grade 7 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 57 : ‘ 4 7 53 93 
White vs Black 57 : : 2 4 55 96 
White vs Hispanic 57 1 2 3 5 53 93 
White vs Asian 57 : p ; ; 56 98 1 2 
White vs AmerIndian 57 2 4 4 7 51 89 
White vs Pacific Islander 57 : , . : 57 100 
White vs Multiracial 57 : : : i 57 100 
NoEcnDis vs EcnDis 57 ' : 1 2 56 98 
ELN vs ELY 57 4 7 7 12 46 81 
SWDN vs SWDY 57 : . : i 57 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.15 Post-Administration Differential Item Functioning for ELA/L Grade 8 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 67 : : 3 4 63 94 1 1 
White vs Black 67 : : 1 1 66 99 
White vs Hispanic 67 : : 1 1 66 99 
White vs Asian 67 ‘ : : ‘ 67 100 
White vs AmerIndian 67 1 1 4 6 62 93 
White vs Pacific Islander 67 : : : : 66 99 1 1 
White vs Multiracial 67 : : : : 67 100 
NoEcnDis vs EcnDis 67 ; F P , 67 100 
ELN vs ELY 67 2 3 7 10 58 87 
SWDN vs SWDY 67 : : : ; 67 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.16 Post-Administration Differential Item Functioning for ELA/L Grade 9 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 81 1 1 8 10 68 84 4 5 
White vs Black 81 : , 3 4 78 96 
White vs Hispanic 81 ; : 3 4 78 96 
White vs Asian 81 : : 1 1 78 96 2 2 
White vs AmerIndian 81 4 5 2 2 75 93 
White vs Pacific Islander 81 : , 4 5 77 95 
White vs Multiracial 81 : : : i 81 100 
NoEcnDis vs EcnDis 81 : : 1 1 80 99 
ELN vs ELY 81 3 4 10 12 68 84 
SWDN vs SWDY 81 : . : i 81 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.17 Post-Administration Differential Item Functioning for ELA/L Grade 10 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
9 9 9 9 9 
DIF Sempanson fica N ce sire ahs Bans ean 
Items 
Male vs Female 58 1 2 6 10 51 88 
White vs Black 58 . : , : 58 100 
White vs Hispanic 58 2 : 3 5 55 95 
White vs Asian 58 5 . ; : 58 100 
White vs AmerIndian 58 3 5 4 7 51 88 
White vs Pacific Islander 58 j : 5 9 53 91 
White vs Multiracial 58 } : y : 58 100 
NoEcnDis vs EcnDis 58 : : 1 2 57 98 
ELN vs ELY 58 5 9 7 12 46 79 
SWDN vs SWDY 58 : : : : 58 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


New Meridian February 28, 2020 Page 184 


2019 Technical Report 


Table A.6.18 Post-Administration Differential Item Functioning for ELA/L Grade 11 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 55 3 : 3 5 51 93 1 2 
White vs Black 55 1 2 3 5 51 93 
White vs Hispanic 55 : ; 2 4 53 96 
White vs Asian 55 : : 2 4 52 95 1 2 
White vs AmerIndian 55 4 7 8 15 43 78 
White vs Pacific Islander 55 : : : : 55 100 
White vs Multiracial 55 ; F 1 2 54 98 
NoEcnDis vs EcnDis 55 j : 1 2 54 98 
ELN vs ELY 55 2 4 2 4 51 93 
SWDN vs SWDY 55 5 : : : 55 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically 
disadvantaged, ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.19 Differential Item Functioning for Mathematics Grade 3 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison nique SN otet, «Nott, Nott, Notat, Total 
Items 
Male vs Female 77 1 1 75 97 1 1 
White vs Black 77 6 8 68 88 3 4 
White vs Hispanic 77 3 4 74 96 
White vs Asian 77 ; . 67 87 9 12 1 1 
White vs Amerindian 77 2 3 75 97 
White vs Pacific Islander 77 1 1 75 97 1 1 
White vs Multiracial 77 1 1 75 97 1 1 
NoEcnDis vs EcnDis 77 : : 77 100 
ELN vs ELY 77 1 1 76 99 
SWDN vs SWDY 77 2 3 75 97 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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DIF Comparison 


Male vs Female 
White vs Black 
White vs Hispanic 
White vs Asian 
White vs Amerindian 
White vs Pacific Islander 
White vs Multiracial 
NoEcnDis vs EcnDis 
ELN vs ELY 


SWDN vs SWDY 


Total N 
of 
Unique 
Items 


72 
72 
72 
72 
72 
72 
72 
72 
72 


72 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
% of % of N % of % of % of 
Total Total Total Total Total 
1 1 71 99 
3 4 69 96 
72 100 
1 1 67 93 6 
4 6 68 94 
1 1 70 97 1 
72 100 
72 100 
4 6 67 93 1 
2 3 70 97 


Note: Amerlndian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.21 Differential Item Functioning for Mathematics Grade 5 


DIF Comparison 


Male vs Female 
White vs Black 
White vs Hispanic 
White vs Asian 
White vs Amerindian 
White vs Pacific Islander 
White vs Multiracial 
NoEcnDis vs EcnDis 
ELN vs ELY 


SWDN vs SWDY 


Total N 
of 
Unique 
Items 


71 
71 
71 
71 
71 
71 
71 
71 
71 


71 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
% of % of % of % of % of 
Total Total Total Total Total 
4 6 67 94 
71 100 
70 99 1 
71 100 
8 11 61 86 3 
71 100 
70 99 1 
71 100 
6 8 65 92 
1 1 69 97 1 1 


Note: Amerlndian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.22 Differential Item Functioning for Mathematics Grade 6 


C- DIF B- DIF ADIF B+ DIF C+ DIF 
Total N 
: 9 : 9 9 
BIE Comparsnn eian . =a N aa N baie ir Be 
Items 
Male vs Female 69 1 1 1 1 66 96 ; 1 1 
White vs Black 69 1 1 68 99 
White vs Hispanic 69 ; 69 100 
White vs Asian 69 66 96 3 4 
White vs Amerindian 69 3 4 64 93 1 1 1 1 
White vs Pacific Islander 69 ; 69 100 
White vs Multiracial 69 1 1 68 99 
NoEcnDis vs EcnDis 69 69 100 
ELN vs ELY 69 1 1 3 4 65 94 
SWDN vs SWDY 69 ; 3 4 63 91 3 4 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.23 Differential Item Functioning for Mathematics Grade 7 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison nique NN otet, «Nott, Nott otat, Total 
Items 
Male vs Female 67 1 1 3 4 62 93 1 1 
White vs Black 67 1 1 ; P 66 99 
White vs Hispanic 67 : : 1 1 66 99 
White vs Asian 67 : ; ; : 59 88 6 9 2 3 
White vs Amerindian 67 : ; 1 1 66 99 
White vs Pacific Islander 67 : 3 1 1 66 99 
White vs Multiracial 67 ; 3 . P 67 100 
NoEcnDis vs EcnDis 67 ; . 3 : 67 100 
ELN vs ELY 67 1 1 2 3 63 94 1 1 
SWDN vs SWDY 67 : : : : 67 100 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.24 Differential Item Functioning for Mathematics Grade 8 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
9 9 9 9 9 
BIE Comparsnn eian . =a q Ee N bah ir Ee 
Items 
Male vs Female 64 2 3 62 97 
White vs Black 64 2 3 61 95 1 2 
White vs Hispanic 64 : : 64 100 
White vs Asian 64 : ; 62 97 2 3 
White vs Amerlndian 64 1 2 62 97 1 2 
White vs Pacific Islander 64 : : 63 98 1 2 
White vs Multiracial 64 . . 64 100 
NoEcnDis vs EcnDis 64 F . 64 100 
ELN vs ELY 64 5 8 59 92 
SWDN vs SWDY 64 1 2 61 95 2 3 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.25 Differential Item Functioning for Algebra I 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison = uiue = Ntotat = Ntotat, «Nota Total Total 
Items 
Male vs Female 111 : . 2 2 108 97 1 1 
White vs Black 111 1 1 2 2 108 97 
White vs Hispanic 111 , ‘ : : 111 100 
White vs Asian 111 3 7 1 1 104 94 6 5 
White vs Amerindian 111 : . 3 3 107 96 1 1 
White vs Pacific Islander 111 : : : : 111 100 
White vs Multiracial 111 : : 1 1 109 98 1 1 
NoEcnDis vs EcnDis 111 : ; : : 111 100 
ELN vs ELY 111 1 1 4 4 102 92 4 4 
SWDN vs SWDY 111 1 1 : : 110 99 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.26 Differential Item Functioning for Geometry 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 118 ; , 2 2 116 98 
White vs Black 118 . ; 2 2 115 97 1 1 
White vs Hispanic 118 . ‘ 2 2 116 98 
White vs Asian 118 : : : : 110 93 7 6 1 1 
White vs AmerIndian 118 1 1 4 3 109 92 4 3 
White vs Pacific Islander 118 ‘ : : : 118 100 
White vs Multiracial 118 F : : , 117 99 1 1 
NoEcnDis vs EcnDis 118 : : 1 1 117 99 
NoEcnDis vs EcnDis 118 1 1 8 7 104 88 5 4 
SWDN vs SWDY 118 1 1 2 2 115 97 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 


Table A.6.27 Differential Item Functioning for Algebra II 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison ; of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 109 : : 5 5 102 94 2 2 
White vs Black 109 : ; 3 3 106 97 
White vs Hispanic 109 : : 1 1 108 99 
White vs Asian 109 é : 1 1 98 90 8 7 2 2 
White vs Amerindian 109 1 1 1 1 106 97 1 1 
White vs Pacific Islander 109 : : . : 108 99 1 1 
White vs Multiracial 109 : ; ; : 109 100 
NoEcnDis vs EcnDis 109 : : : F 109 100 
ELN vs ELY 109 2 2 3 3 99 91 4 4 1 1 
SWDN vs SWDY 109 ; . 4 4 105 96 


Note: Amerlndian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, 
ELN = not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Table A.6.28 Differential Item Functioning for Integrated Mathematics I 


C- DIF B- DIF A DIF B+ DIF C+ DIF 
Total N 
DIF Comparison of % of % of % of % of % of 
Unique Total Total Total Total Total 
Items 
Male vs Female 42 : : 42 100 
White vs Black 42 . : 42 100 
White vs Hispanic 42 . : 42 100 
White vs Asian 42 ; : 42 100 
White vs Amerindian 42 : ; 42 100 
White vs Pacific Islander 42 . : 42 100 
White vs Multiracial 42 1 2 41 98 
NoEcnDis vs EcnDis 42 : ; 42 100 
NoEcnDis vs EcnDis 42 : ; 42 100 
SWDN vs SWDY 42 1 2 41 98 


Note: Amerindian = American Indian/Alaska Native, Black = Black/African American, Hispanic = Hispanic/Latino, Pacific Islander = Native 
Hawaiian/Pacific Islander, Multiracial = Multiple Race Selected, NoEcnDis = not economically disadvantaged, EcnDis = economically disadvantaged, ELN 
= not an English learner, ELY = English learner, SWDN = not student with disability, SWDY = student with disability. 
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Appendix 7.1: Post-Equated IRT Results for Spring 2019 English Language 
Arts/Literacy (ELA/L) 


Table A.7.1 Post-Equated IRT Summary Parameter Estimates for All Items for ELA/L by Grade 


Grade 


E03 


E04 


E05 


E06 


E07 


E08 


E09 


E10 


E11 


Item 
Grouping 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 
Writing 
All Items 
Reading 


Writing 


No. of 
Score 
Points 


128 
92 
36 

164 

124 
40 

145 

112 
33 

172 

130 
42 

139 

104 
35 

159 

124 
35 

197 

148 
49 

141 

106 
35 

139 

104 
35 


No. of 
Items 
58 
46 
12 
74 


14 


b- Estimates Summary 


a- Estimates Summary 


Mean 


sD 


0.91 
0.74 
0.25 
1.55 
1.61 
0.37 
1.06 
1.06 
0.68 
0.87 
0.81 
0.42 
0.73 
0.71 
0.42 
0.82 
0.82 
0.36 
0.80 
0.85 
0.38 
0.79 
0.85 
0.32 
0.83 
0.89 


0.27 


Min 


-1.66 
-1.66 
1.26 
-9.56 
-9.56 
0.81 
-5.38 
-5.38 
0.51 
-1.93 
-1.93 
0.68 
-1.34 
-1.34 
0.15 
-1.88 
-1.88 
0.37 
-1.36 
-1.36 
0.09 
-0.93 
-0.93 
0.30 
-0.67 
-0.67 


0.68 


Max 


2.05 
2.02 
2.05 
2.35 
2.35 
1.83 
2.63 
2.06 
2.63 
2.95 
2.95 
1.88 
2.37 
2.37 
1.54 
2.83 
2.83 
1.23 
2.95 
2.95 
1.55 
2.85 
2.85 
1.24 
4.55 
4.55 
1.51 


Mean 


0.60 
0.50 
0.96 
0.45 
0.36 
0.89 
0.49 
0.42 
0.86 
0.50 
0.43 
0.91 
0.48 
0.39 
0.96 
0.47 
0.38 
1.02 
0.51 
0.40 
1.09 
0.49 
0.39 
1.02 
0.46 
0.38 


0.90 


sD 


0.24 
0.15 
0.12 
0.23 
0.13 
0.06 
0.21 
0.15 
0.07 
0.23 
0.15 
0.15 
0.25 
0.12 
0.16 
0.25 
0.12 
0.16 
0.29 
0.15 
0.11 
0.27 
0.15 
0.07 
0.24 
0.15 


0.15 


Min Max 
0.22 1.24 
0.22 0.84 
0.72 1.24 
0.12 0.99 
0.12 0.74 
0.81 0.99 
0.10 0.96 
0.10 0.75 
0.74 0.96 
0.18 1.16 
0.18 0.96 
0.63 1.16 
0.17 1.13 
0.17 0.71 
0.67 1.13 
0.18 1.19 
0.18 0.70 
0.73 1.19 
0.14 1.23 
0.14 0.76 
0.81 1.23 
0.14 1.12 
0.14 0.94 
0.93 1.12 
0.08 1.10 
0.08 0.84 
0.63 1.10 
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Table A.7.2 Post-Equated IRT Standard Errors of Item Parameter Estimates for ELA/L by Grade 
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b- Estimates Summary 


a- Estimates Summary 


Item Nee! No. of 
Grade Grouping Score Rene Mean SD Min Max Mean SD Min Max 
Points 
All Items 102 46 0.01 0.00 0.00 0.03 0.01 0.00 0.00 0.02 
E03 Reading 72 36 0.01 0.00 0.00 0.03 0.00 0.00 0.00 0.01 
Writing 30 10 0.01 0.00 0.01 0.02 0.01 0.00 0.01 0.02 
All Items 137 62 0.02 0.07 0.00 0.52 0.00 0.00 0.00 0.02 
E04 Reading 104 52 0.02 0.07 0.00 0.52 0.00 0.00 0.00 0.01 
Writing 33 10 0.01 0.00 0.01 0.01 0.01 0.00 0.01 0.02 
All Items 141 64 0.01 0.01 0.00 0.06 0.00 0.00 0.00 0.01 
E05 Reading 108 54 0.01 0.01 0.00 0.06 0.00 0.00 0.00 0.01 
Writing 33 10 0.01 0.00 0.00 0.02 0.01 0.00 0.01 0.01 
All Items 139 62 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.02 
E06 Reading 104 52 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.01 
Writing 35 10 0.01 0.01 0.00 0.03 0.01 0.00 0.00 0.02 
All Items 135 60 0.01 0.00 0.00 0.02 0.01 0.00 0.00 0.02 
E07 Reading 100 50 0.01 0.00 0.00 0.02 0.00 0.00 0.00 0.01 
Writing 35 10 0.01 0.00 0.00 0.02 0.01 0.01 0.00 0.02 
All Items 139 62 0.01 0.00 0.00 0.03 0.00 0.00 0.00 0.02 
E08 Reading 104 52 0.01 0.01 0.00 0.03 0.00 0.00 0.00 0.01 
Writing 35 10 0.01 0.00 0.00 0.01 0.01 0.00 0.01 0.02 
All Items 77 34 0.01 0.01 0.00 0.04 0.01 0.00 0.00 0.02 
E09 Reading 56 28 0.01 0.01 0.01 0.04 0.00 0.00 0.00 0.01 
Writing 21 6 0.01 0.00 0.00 0.01 0.01 0.01 0.01 0.02 
All Items 139 62 0.01 0.01 0.00 0.04 0.01 0.00 0.00 0.02 
E10 Reading 104 52 0.01 0.01 0.01 0.04 0.00 0.00 0.00 0.01 
Writing 35 10 0.01 0.00 0.00 0.01 0.01 0.00 0.01 0.02 
All Items 100 44 0.03 0.06 0.01 0.38 0.01 0.01 0.01 0.03 
E11 Reading 72 36 0.04 0.06 0.01 0.38 0.01 0.00 0.01 0.02 
Writing 28 8 0.02 0.00 0.02 0.03 0.02 0.00 0.02 0.03 
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G? Qi 
lew No. of No. 
Grade Grouping Score of Mean SD Min Max Mean SD Min Max 
Points Items 
All Items 102 46 2,732.6 2,016.8 385.7 10,703.0 2,574.6 2,037.9 360.1 11,583.4 
E03 Reading 72 36 2,969.8 2,182.6 385.7 10,703.0 2,786.4 2,223.7 360.1 11,583.4 
Writing 30 10 1,878.6 880.8 416.8 2,880.3 1,812.3 842.8 385.8 2,758.7 
All Items 137 62 3,584.0 3,089.3 163.8 14,358.4 3,473.5 3,003.8 159.2 14,072.8 
E04 Reading 104 52 3,697.0 3,305.7 163.8 14,358.4 3,591.1 3,211.4 159.2 14,072.8 
Writing 33 10 2,996.5 1,518.5 405.1 4,954.5 2,861.6 1,490.4 370.8 4,929.9 
All Items 141 64 2,920.3 3,540.3 151.3 18,025.6 2,806.2 3,505.6 142.6 17,306.2 
E05 Reading 108 54 2,937.7 3,764.1 151.3 18,025.6 2,793.7 3,671.5 142.6 17,306.2 
Writing 33 10 2,826.2 2,070.3 406.3 5,952.4 2,873.7 2,576.5 365.8 8,622.0 
All Items 139 62 3,284.2 2,606.4 289.7 13,658.8 3,055.4 2,407.7 291.5 11,996.2 
E06 Reading 104 52 3,319.6 2,785.0 289.7 13,658.8 3,110.2 2,574.1 291.5 11,996.2 
Writing 35 10 3,100.5 1,431.2 4948 5,135.3 2,770.1 1,278.4 437.5 4,441.0 
All Items 135 60 3,436.0 4,207.6 148.1 24,499.4 3,263.3 4,170.4 140.0 26,003.2 
E07 Reading 100 50 3,295.4 4,308.1 148.1 24,499.4 3,194.4 4,367.2 140.0 26,003.2 
Writing 35 10 4,139.1 3,788.2 474.8 10,342.1 3,607.9 3,165.3 418.5 8,867.1 
All Items 139 62 3,502.7 3,075.6 125.0 14,717.3 3,296.8 2,871.1 123.0 12,427.9 
E08 Reading 104 62 3,262.2 3,178.3 125.0 14,717.3 3,140.1 3,016.9 123.0 12,427.9 
Writing 35 10 4,753.6 2,189.6 668.3 7,055.1 4,111.3 1,848.5 593.0 6,093.5 
All Items 77 34 2,394.4 2,548.5 252.1 13,398.9 2,225.1 2,452.2 226.2 12,715.7 
E09 Reading 56 28 2,279.7 2,749.1 252.1 13,398.9 2,160.5 2,662.4 226.2 12,715.7 
Writing 21 6 2,929.2 1,279.9 1,419.3 4,383.7 2,526.5 1,130.7 1,203.2 3,824.6 
All Items 139 62 2,325.6 1,874.8 188.5 8,318.2 2,220.8 1,887.5 183.2 8,269.9 
E10 Reading 104 52 2,307.3 2,024.2 188.5 8,318.2 2,247.8 2,041.9 183.2 8,269.9 
Writing 35 10 2,420.4 769.7 920.2 3,692.6 2,080.2 702.3 743.1 3,306.3 
All Items 100 44 565.9 3209 105.9 1,718.9 5145 2942 1044 1,666.5 
E11 Reading 72 36 520.6 327.7 105.9 1,718.9 477.3 304.8 1044 1,666.5 
Writing 28 8 770.1 193.0 428.4 1,063.0 682.3 166.6 369.9 902.5 
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Appendix 7.2: Pre-Equated IRT Results for Spring 2019 English Language 
Arts/Literacy (ELA/L) 


Table A.7.4 Pre-Equated IRT Summary Parameter Estimates for All Items for ELA/L by Grade 


b- Estimates Summary 


a- Estimates Summary 


Item Nevot No. of 

Grade Grouping scone tere Mean SD Min Max Mean SD Min Max 
Points 

All Items 128 58 0.37 0.97 -1.40 3.13 0.59 0.21 0.16 1.01 
E03 Reading 92 46 0.05 0.81 -1.40 3.13 0.51 0.15 0.16 0.84 
Writing 36 12 1.59 0.39 1.16 2.32 0.90 0.10 0.72 1.01 
All Items 164 74 0.24 1.29 -6.48 2.29 0.45 0.22 0.17 1.02 
E04 Reading 124 62 0.06 1.32 -6.48 2.29 0.37 0.12 0.17 0.75 
Writing 40 12 1.19 0.49 0.74 2.29 0.87 0.09 0.67 1.02 
All Items 145 66 0.28 1.15 -6.27 2.69 0.49 0.23 0.19 1.06 
E05 Reading 112 56 0.14 1.16 -6.27 2.23 0.41 0.14 0.19 0.70 
Writing 33 10 1.06 0.71 0.47 2.69 0.91 0.09 0.71 1.06 
All Items 172 77 0.29 0.92 -1.97 4.45 0.51 0.23 0.20 1.13 
E06 Reading 130 65 0.11 0.87 -1.97 4.45 0.45 0.17 0.20 1.10 
Writing 42 12 1.25 0.42 0.59 1.93 0.89 0.15 0.60 1.13 
All Items 139 62 0.22 0.70 -1.33 1.86 0.49 0.24 0.17 1.18 
E07 Reading 104 52 0.10 0.68 -1.33 1.86 0.40 0.13 0.17 0.74 
Writing 35 10 0.84 0.44 0.30 1.70 0.95 0.15 0.66 1.18 
All Items 159 72 0.13 0.78 -2.03 2.68 0.47 0.23 0.19 1.12 
E08 Reading 124 62 0.02 0.79 -2.03 2.68 0.39 0.12 0.19 0.69 
Writing 35 10 0.75 0.37 0.30 1.32 0.98 0.10 0.81 1.12 
All Items 197 88 0.63 0.79 -1.29 2.95 0.52 0.30 0.17 1.44 
E09 Reading 148 74 0.59 0.84 -1.29 2.95 0.40 0.15 0.17 0.73 
Writing 49 14 0.85 0.38 0.12 1.55 1.12 0.16 0.86 1.44 
All Items 141 63 0.62 0.75 -0.54 2.81 0.50 0.28 0.13 1.24 
E10 Reading 106 53 0.59 0.80 -0.54 2.81 0.40 0.16 0.13 0.93 
Writing 35 10 0.80 0.31 0.41 1.25 1.05 0.14 0.84 1.24 
All Items 139 62 0.88 0.68 -0.67 2.80 0.46 0.23 0.14 1.10 
E11 Reading 104 52 0.82 0.71 -0.67 2.80 0.39 0.15 0.14 0.84 
Writing 35 10 1.20 0.33 0.61 1.74 0.85 0.17 0.56 1.10 
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Appendix 7.3: Pre-Equated IRT Results for Spring 2019 Mathematics 


Table A.7.5 Pre-Equated IRT Summary Parameter Estimates for All Items for Mathematics by Grade/Subject 


b- Estimates Summary 


a- Estimates Summary 


Item No- of No. of 
Grade Grouping Score Herne Mean SD Min Max Mean SD Min Max 
Points 
All Items 110 77 -0.28 0.98 -2.40 1.90 0.79 0.24 0.32 1.33 
SSMC 20 20 -0.77 0.88 -2.14 1.90 0.73 0.14 0.43 0.94 
CR 90 57 -0.11 0.96 -2.40 1.68 0.82 0.26 0.32 1.33 
oe Type | 74 67 -0.45 0.92 -2.40 1.90 0.83 0.22 0.43 1.33 
Type Il 21 6 0.91 0.71 -0.07 1.68 0.43 0.06 0.32 0.51 
Type III 15 4 0.74 0.31 0.52 1.18 0.66 0.16 0.50 0.84 
All Items 112 72 -0.15 0.95 -2.61 2.54 0.74 0.20 0.38 1.32 
SSMC 18 18 -1.06 0.66 -2.01 0.37 0.70 0.22 0.40 1.24 
CR 94 54 0.15 0.83 -2.61 2.54 0.75 0.20 0.38 1.32 
_ Type | 74 61 -0.30 0.94 -2.61 2.54 0.76 0.20 0.40 1.32 
Type Il 20 6 0.56 0.33 -0.11 0.80 0.59 0.17 0.38 0.81 
Type III 18 5 0.82 0.43 0.16 1.17 0.62 0.19 0.40 0.82 
All Items 116 71 0.02 0.91 -2.21 1.77 0.73 0.27 0.19 1.57 
SSMC 20 20 -0.58 0.77 -2.14 0.90 0.78 0.29 0.27 1.42 
CR 96 51 0.25 0.87 -2.21 1.77 0.70 0.26 0.19 1.57 
ee Type | 71 59 -0.16 0.87 -2.21 1.75 0.76 0.28 0.19 1.57 
Type Il 24 7 0.91 0.62 0.05 1.77 0.53 0.18 0.27 0.73 
Type III 21 5 0.81 0.68 -0.17 1.69 0.64 0.15 0.45 0.80 
All Items 121 69 0.36 0.89 -3.02 1.98 0.72 0.24 0.20 1.30 
SSMC 15 15 -0.30 1.00 -3.02 0.74 0.64 0.25 0.20 1.19 
CR 106 54 0.54 0.77 -1.17 1.98 0.75 0.23 0.31 1.30 
Type | 75 57 0.23 0.89 -3.02 1.83 0.75 0.25 0.20 1.30 
Type Il 25 7 0.90 0.52 -0.02 1.38 0.59 0.11 0.43 0.74 
Type III 21 5 1.09 0.70 0.13 1.98 0.59 0.11 0.45 0.70 
All Items 112 67 0.75 0.95 -1.03 3.36 0.69 0.29 0.19 1.38 
SSMC 20 20 0.41 1.17 -1.03 3.13 0.53 0.24 0.19 0.88 
CR 92 47 0.90 0.81 -0.67 3.36 0.76 0.28 0.25 1.38 
oe Type | 70 56 0.61 0.91 -1.03 3.13 0.73 0.30 0.19 1.38 
Type Il 21 6 1.72 1.07 0.76 3.36 0.47 0.11 0.31 0.61 
Type III 21 5 1.19 0.44 0.60 1.76 0.58 0.09 0.50 0.74 
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b- Estimates Summary a- Estimates Summary 
Item No.of No. of 
Grade Grouping Score Haine Mean SD Min Max Mean SD Min Max 
Points 
All Items 115 64 0.91 0.98 -1.12 2.55 0.61 0.21 0.22 1.29 
SSMC 14 14 0.08 0.84 -1.12 1.74 0.47 0.15 0.22 0.78 
CR 101 50 1.15 0.88 -0.76 2.55 0.66 0.21 0.24 1.29 
oe Type | 73 52 0.70 0.94 -1.12 2.44 0.62 0.23 0.22 1.29 
Type Il 24 7 1.65 0.45 1.08 2.45 0.65 0.13 0.55 0.91 
Type III 18 5 2.06 0.40 1.52 2.55 0.54 0.14 0.41 0.79 
All Items 209 111 1.27 1.03 -0.96 3.62 0.58 0.27 0.16 1.41 
SSMC 42 42 0.79 1.09 -0.96 3.62 0.45 0.18 0.16 0.85 
CR 167 69 1.56 0.88 -0.77 3.24 0.66 0.28 0.17 1.41 
. Type | 131 91 1.11 1.07 -0.96 3.62 0.57 0.28 0.16 1.41 
Type Il 39 11 1.89 0.29 1.55 2.51 0.68 0.17 0.38 0.91 
Type III 39 9 2.09 0.38 1.50 2.60 0.58 0.12 0.41 0.72 
All Items 223 118 1.16 0.94 -1.25 3.83 0.71 0.31 0.19 1.54 
SSMC 26 26 0.62 1.18 -1.25 3.83 0.47 0.19 0.19 0.77 
CR 197 92 1.31 0.80 -0.86 3.50 0.78 0.30 0.19 1.54 
Type | 130 95 0.99 0.95 -1.25 3.83 0.71 0.34 0.19 1.54 
Type Il 42 12 1.94 0.53 1.17 2.79 0.75 0.08 0.63 0.89 
Type Ill 51 11 1.78 0.39 1.05 2.23 0.72 0.21 0.36 1.09 
All Items 218 109 1.41 0.92 -1.53 3.67 0.65 0.29 0.18 1.34 
SSMC 24 24 0.86 0.97 -1.53 2.48 0.49 0.20 0.18 0.89 
CR 194 85 1.57 0.85 -0.34 3.67 0.70 0.29 0.19 1.34 
- Type | 133 88 1.24 0.88 -1.53 3.67 0.66 0.30 0.18 1.34 
Type Il 34 10 1.83 0.85 0.48 3.29 0.63 0.20 0.40 0.96 
Type III 51 11 2.41 0.48 1.64 3.07 0.61 0.25 0.34 1.13 
All Items 81 42 1.02 0.88 -0.64 2.78 0.62 0.23 0.25 1.39 
SSMC 13 13 0.68 0.65 -0.64 1.86 0.49 0.17 0.25 0.84 
CR 68 29 1.16 0.93 -0.52 2.78 0.68 0.23 0.25 1.39 
Type | 49 34 0.75 0.73 -0.64 2.27 0.61 0.25 0.25 1.39 
Type Il 14 4 2.02 0.69 1.12 2.78 0.72 0.10 0.57 0.78 
Type III 18 4 2.25 0.30 1.92 2.55 0.61 0.19 0.42 0.78 
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Item 


erade Grouping 


All Items 
SSMC 
CR 
M2 
Type | 
Type II 
Type Ill 
All Items 
SSMC 
CR 
M3 
Type | 
Type Il 


Type III 


No. of 
Score 
Points 


80 
11 
69 
48 
14 
18 
81 

7 
74 
49 
14 
18 


No. of 
Items 
41 

11 

30 

33 

4 

4 

40 


4 


b- Estimates Summary 


a- Estimates Summary 


Mean 


1.58 
-0.01 
2.17 
1.43 
2.43 
1.97 
1.39 
1.02 
1.47 
1.26 
1.41 


2.40 


sD Min 
1.30 -0.67 
0.56 -0.67 
0.96 0.41 
1.37 -0.67 
0.87 1.77 
0.58 1.32 
0.94 -0.35 
0.69 -0.27 
0.97 -0.35 
0.93 -0.35 
0.70 0.36 
0.72 1.57 


Max 


4.68 
1.44 
4.68 
4.68 
3.62 
2.54 
3.32 
1.62 
3.32 
3.32 
1.83 


3.06 


Mean 


0.67 
0.67 
0.68 
0.68 
0.72 
0.62 
0.57 
0.41 
0.60 
0.59 
0.50 


0.47 


sD 


0.31 
0.22 
0.35 
0.34 
0.26 
0.11 
0.27 
0.21 
0.27 
0.29 
0.18 
0.09 


Min 


0.17 
0.31 
0.17 
0.17 
0.46 
0.46 
0.17 
0.17 
0.24 
0.17 
0.32 


0.37 


Max 


1.30 
1.00 
1.30 
1.30 
1.07 
0.71 
1.27 
0.82 
1.27 
1.27 
0.69 


0.57 


Note: M03 through M08 = mathematics grades 3 through 8, A1 = Algebra |, GO = Geometry, A2 = Algebra II, M1 = Integrated 
Mathematics I, M2 = Integrated Mathematics II, M3 = Integrated Mathematics III. 
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Appendix 11: Students by Grade/Subject and Mode, for Each State 


Table A.11.1 ELA/L Students, by State, and Grade 
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State 


All 


States 


DC 


DD 


MD 


Category 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 


Total 
525,423 
524,667 

99.9 
756 
0.1 

7.7 
40,867 
40,732 
99.7 
135 
0.3 

3.4 
17,615 
17,603 
99.9 
n/r 

n/r 
88.9 
466,774 
466,165 
99.9 
609 
0.1 


Grade 3 
72,442 
72,319 

99.8 
123 
0.2 
1.2 
6,400 
6,365 
99.5 
35 
0.5 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
12.6 
66,025 
65,937 
99.9 
88 
0.1 


English Language Arts-Literacy 


Grade 4 
74,167 
74,069 

99.9 
98 
0.1 
1.2 
6,166 
6,140 
99.6 
26 
0.4 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
12.9 
67,990 
67,918 
99.9 
72 
0.1 


Grade 5 
75,421 
75,297 

99.8 
124 
0.2 
1.1 
5,935 
5,905 
99.5 
30 
0.5 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
13.2 
69,478 
69,384 
99.9 
94 
0.1 


Grade 6 
78,755 
78,665 

99.9 
90 
0.1 
1.0 
5,498 
5,485 
99.8 
n/r 

n/r 
1.1 
5,552 
5,548 
99.9 
n/r 

n/r 
12.9 
67,695 
67,622 
99.9 
73 
0.1 


Grade 7 
75,084 
74,990 

99.9 
94 
0.1 
0.9 
4,842 
4,822 
99.6 
20 
0.4 
0.9 
4,650 
4,646 
99.9 
n/r 

n/r 
12.5 
65,583 
65,513 
99.9 
70 
0.1 


Grade 8 
72,619 
72,554 

99.9 
65 

0.1 
0.9 
4,560 
4,552 
99.8 
n/r 

n/r 
0.8 
4,209 
4,207 
100.0 
n/r 

n/r 
12.2 
63,840 
63,785 
99.9 
55 

0.1 


Grade 9 Grade 10 


3,388 
3,388 
100.0 
n/r 
n/r 
0.6 
3,388 
3,388 
100.0 
n/r 
n/r 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 


73,487 
73,325 
99.8 
162 
0.2 

0.8 
4,018 
4,015 
99.9 
n/r 

n/r 

0.6 
3,204 
3,202 
99.9 
n/r 

n/r 
12.6 
66,163 
66,006 
99.8 
157 
0.2 


Grade 11 
60 

60 
100.0 
n/r 
n/r 
0.0 
60 

60 
100.0 
n/r 
n/r 
n/a 
n/a 
n/a 
n/a 
n/a 
n/a 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 


Note: DD=Department of Defense Education Activity, DC=District of Columbia, and MD=Maryland; CBT=computer-based test; PBT=paper- 
based test; n/a=not applicable; and n/r=not reported due to n<20. 
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Table A.11.2 Mathematics Students, by State, and Grade/Subject 


Mathematics 


State Category Total Grade3 Grade4 Grade5 Grade6 Grade? Grade 8 Al GO A2 M1 M2 M3 
N of Students 526,073 79,070 80,595 81,489 78,665 62,845 39,807 84,749 — 5,129 144 190 n/a 

All N of CBT 524,848 78,922 80,492 81,339 78,572 62,759 39,763 84,161 asses 5,128 144 190 n/a 
sites % of CBT 99.8 100 100 100 100 100 100 99 100 100 100 100 n/a 
N of PBT 1,225 148 103 150 93 86 44 588 n/r n/r n/r n/r n/a 

% of PBT 0.2 0 0 0 0 0 0 1 n/r n/r n/r n/r n/a 

% of All Data 7.5 1 1 1 1 1 1 1 1 0 0 0 n/a 

Nof Students 40,469 6,434 6,218 5,981 5,510 4,775 3,686 3,500 3,826 205 144 190 n/a 

pc N of CBT 40,339 6,404 6,194 5,950 5,497 4,756 3,677 3,499 3,823 205 144 190 n/a 
% of CBT 99.7 100 100 100 100 100 100 100 100 100 100 100 n/a 

N of PBT 130 30 24 31 n/r n/r n/r n/r n/r n/r n/r n/r n/a 

% of PBT 0.3 1 0 1 n/r n/r n/r n/r n/r n/r n/r n/r n/a 

% of All Data 6.2 1 1 1 1 n/a n/a 1 1 1 n/a n/a n/a 

N of Students 32,787 6,116 5,922 5,637 5,520 n/a n/a 3,865 3,029 2,698 n/a n/a n/a 

DD N of CBT 32,707 6,084 5,910 5,610 5,516 n/a n/a 3,865 3,025 2,697 n/a n/a n/a 
% of CBT 99.8 100 100 100 100 n/a n/a 100 100 100 n/a n/a n/a 

N of PBT 80 32 n/r 27 n/r n/a n/a n/r n/r n/r n/a n/a n/a 

% of PBT 0.2 1 n/r 1 n/r n/a n/a n/r n/r n/r n/a n/a n/a 

% of All Data 86.0 13 13 13 13 11 7 15 1 0 n/a n/a n/a 

Nof Students 452,662 66,502 68,444 69,863 67,625 58,061 36,111 77,295 6,535 2,226 n/a n/a n/a 

MD N of CBT 451,647 66,416 68,377 69,771 67,549 57,994 36,076 76,708 6,530 2,226 n/a n/a n/a 
% of CBT 99.8 100 100 100 100 100 100 99 100 100 n/a n/a n/a 

N of PBT 1,015 86 67 92 76 67 35 587 n/r n/r n/a n/a n/a 

% of PBT 0.2 0 0 0 0 0 0 1 n/r n/r n/a n/a n/a 


Note: Includes students taking English-language mathematics tests, students taking Spanish-language mathematics tests, and students taking 
accommodated forms. DD=Department of Defense Education Activity, DC=District of Columbia, and MD=Maryland; A1=Algebra |, GO=Geometry, A2 
= Algebra II, M1=Integrated Mathematics I, M2=Integrated Mathematics II, M3=Integrated Mathematics III]. CBT=computer-based test; PBT=paper- 
based test; n/a=not applicable; and n/r=not reported due to n<20. 
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Table A.11.3 Spanish-Language Mathematics Students, by State, and Grade/Subject 


Mathematics 

State Category Total Grade3 Grade4 Grade5 Grade6 Grade? Grade 8& A1 GO A2 M1 M2 M3 
N of Students 1,602 139 147 152 112 139 149 705 59 n/r n/r n/r n/a 

All N of CBT 1,184 138 147 152 110 138 149 291 59 n/r n/r n/r n/a 
Sige % of CBT 73.9 99 100 100 98 99 100 A 100 n/r n/r n/r n/a 
N of PBT 418 n/r n/a n/a n/r n/r n/a 414 n/a n/r n/r n/r n/a 

% of PBT 26.1 n/r n/a n/a n/r n/r n/a 59 n/a n/r n/r n/r n/a 

% of All Data 28.1 5 4 4 3 4 4 1 4 n/r n/r n/r n/a 

N of Students 449 79 68 57 49 57 58 22 59 n/r n/r n/r n/a 

Dc N of CBT 449 79 68 57 49 57 58 22 59 n/r n/r n/r n/a 
% of CBT 100.0 100 100 100 100 100 100 100 100 n/r n/r n/r n/a 

N of PBT n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/a 

% of PBT n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/r n/a 

% of All Data n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 

N of Students n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 

DD N of CBT n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 
% of CBT n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 

N of PBT n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 

% of PBT n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 

% of All Data 71.8 4 5 6 4 5 6 43 n/r n/r n/a n/a n/a 

N of Students 1,153 60 79 95 63 82 91 683 n/r n/r n/a n/a n/a 

MD N of CBT 735 59 79 95 61 81 91 269 n/r n/r n/a n/a n/a 
% of CBT 63.7 98 100 100 97 99 100 39 n/r n/r n/a n/a n/a 

N of PBT 418 n/r n/r n/r n/r n/r n/r 414 n/r n/r n/a n/a n/a 

% of PBT 36.3 n/r n/r n/r n/r n/r n/r 61 n/r n/r n/a n/a n/a 


Note: DD=Department of Defense Education Activity, DC=District of Columbia, and MD=Maryland; A1=Algebra |, GO=Geometry, A2=Algebra Il, 
M1=Integrated Mathematics |, M2=Integrated Mathematics II, M3=Integrated Mathematics Ill. CBT=computer-based test; PBT = paper-based test; 
n/a=not applicable; and n/r=not reported due to n<20. 
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Grade 


10 


11 


Mode Valid Cases 


All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 


72,442 
72,319 
123 
74,167 
74,069 
98 
75,421 
75,297 
124 
78,755 
78,665 
90 
75,084 
74,990 
94 
72,619 
72,554 
65 
3,388 
3,388 
n/r 
73,487 
73,325 
162 

60 

60 

n/r 


Female 
N 
35,570 
35,523 
47 
36,426 
36,388 
38 
36,916 
36,868 
48 
38,507 
38,468 
39 
36,655 
36,620 
35 
35,820 
35,787 
33 
1,735 
1,735 
n/r 
35,774 
35,715 
59 
25 
25 
n/r 


% 
49.1 
49.1 
38.2 
49.1 
49.1 
38.8 
48.9 
49.0 
38.7 
48.9 
48.9 
43.3 
48.8 
48.8 
37.2 
49.3 
49.3 
50.8 
51.2 
51.2 

n/r 
48.7 
48.7 
36.4 
41.7 
41.7 

n/r 


Male 
N 
36,872 
36,796 
76 
37,741 
37,681 
60 
38,505 
38,429 
76 
40,248 
40,197 
51 
38,429 
38,370 
59 
36,799 
36,767 
32 
1,653 
1,653 
n/r 
37,713 
37,610 
103 
35 
35 
n/r 


Note: CBT=computer-based test; PBT=paper-based test. n/r=not reported due to n<20. 
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% 
50.9 
50.9 
61.8 
50.9 
50.9 
61.2 
51.1 
51.0 
61.3 
51.1 
51.1 
56.7 
51.2 
51.2 
62.8 
50.7 
50.7 
49.2 
48.8 
48.8 

n/r 
51.3 
51.3 
63.6 
58.3 
58.3 

n/r 
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Table A.11.5 All States Combined: All Mathematics Students by Grade/Subject, Mode, and Gender 


Grade 


Al 


GO 


M1 


M2 


M3 


Note: Includes students taking English-language mathematics tests, students taking Spanish- 


Mode 
All 


CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 


Valid Cases 
79,070 
78,922 

148 
80,595 
80,492 

103 
81,489 
81,339 

150 
78,665 
78,572 

93 
62,845 
62,759 
86 
39,807 
39,763 
44 
84,749 
84,161 

588 
13,390 
13,378 

n/r 
5,129 
5,128 
n/r 
144 
144 


n/a 


Female 
N 
38,807 
38,748 
59 
39,553 
39,511 
42 
39,956 
39,894 
62 
38,504 
38,464 
40 
30,782 
30,753 
29 
18,899 
18,881 
n/r 
41,214 
40,963 
251 
6,490 
6,484 
n/r 
2,574 
2,574 
n/r 
84 
84 
n/r 
88 
88 
n/r 
n/a 
n/a 
n/a 


% 
49.1 
49.1 
39.9 
49.1 
49.1 
40.8 
49.0 
49.0 
41.3 
48.9 
49.0 
43.0 
49.0 
49.0 
33.7 
47.5 
47.5 

n/r 
48.6 
48.7 
42.7 
48.5 
48.5 

n/r 
50.2 
50.2 

n/r 
58.3 
58.3 

n/r 
46.3 
46.3 

n/r 

n/a 
n/a 
n/r 


N 
40,263 
40,174 
89 
41,042 
40,981 
61 
41,533 
41,445 
88 
40,161 
40,108 
53 
32,063 
32,006 
57 
20,908 
20,882 
26 
43,535 
43,198 
337 
6,900 
6,894 
n/r 
2,555 
2,554 
n/r 
60 
60 
n/r 
102 
102 

n/r 
n/a 
n/a 
n/a 


Male 


language mathematics tests, and students taking accommodated forms. A1=Algebra I, 


GO=Geometry, A2=Algebra II, M1=Integrated Mathematics |, M2=Integrated Mathematics II, 
M3=Integrated Mathematics III. CBT=computer-based test; PBT=paper-based test; n/a=not 
applicable. n/r=not reported due to n<20. 
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% 
50.9 
50.9 
60.1 
50.9 
50.9 
59.2 
51.0 
51.0 
58.7 
51.1 
51.0 
57.0 
51.0 
51.0 
66.3 
52.5 
52.5 
59.1 
51.4 
51.3 
57.3 
51.5 
51.5 

n/r 
49.8 
49.8 

n/r 
41.7 
41.7 

n/r 
53.7 
53.7 

n/r 

n/a 
n/a 
n/a 
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Table A.11.6 All States Combined: Spanish-Language Mathematics Students, by Grade/Subject, Mode, and Gender 


Grade 


Al 


GO 


M1 


M2 


M3 


Mode Valid Cases 


All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 

All 
CBT 
PBT 


139 
138 
n/r 
147 
147 
n/r 
152 
152 
n/r 
112 
110 
n/r 
139 
138 
n/r 
149 
149 
n/r 
705 
291 
414 
59 
59 
n/r 
n/r 
n/r 
n/r 
All 
CBT 
PBT 
All 
CBT 
PBT 
All 
CBT 
PBT 


Female 

N 
73 
72 
n/r 
69 
69 
n/r 
67 
67 
n/r 
64 
63 
n/r 
69 
69 
n/r 
65 
65 
n/r 
320 
134 
186 
27 
27 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 
n/r 
n/a 
n/a 
n/a 


% 
52.5 
52.2 

n/r 
46.9 
46.9 

n/r 
44.1 
44.1 

n/r 
57.1 
57.3 

n/r 
49.6 
50.0 

n/r 
43.6 
43.6 

n/r 
45.4 
46.0 
44.9 
45.8 
45.8 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/a 

n/a 

n/a 


Male 

N 
66 
66 
n/r 
78 
78 
n/r 
85 
85 
n/r 
48 
47 
n/r 
70 
69 
n/r 
84 
84 
n/r 
385 
157 
228 


% 
47.5 
47.8 

n/r 
53.1 
53.1 

n/r 
55.9 
55.9 

n/r 
42.9 
42.7 

n/r 
50.4 
50.0 

n/r 
56.4 
56.4 

n/r 
54.6 
54.0 
55.1 
54.2 
54.2 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/r 

n/a 

n/a 

n/a 


Note: A1=Algebra |, GO=Geometry, A2=Algebra II, M1=Integrated Mathematics I, M2=Integrated 
Mathematics II, M3=Integrated Mathematics III. CBT=computer-based test; PBT=paper-based 
test; n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.7 Demographic Information for Grade 3 ELA/L, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 48.2 77.5 n/a 45.3 
Student with Disabilities 15.1 19.5 n/a 14.6 
English learner 14.0 14.5 n/a 13.9 
Male 50.9 50.1 n/a 51.0 
Female 49.1 49.9 n/a 49.0 
American Indian/Alaska Native 0.3 n/r n/a 0.3 
Asian 6.0 1.3 n/a 6.5 
Black/African American 36.0 66.6 n/a 33.1 
Hispanic/Latino 18.4 16.3 n/a 18.6 
White/Caucasian 34.2 12.8 n/a 36.2 
Native Hawaiian/Pacific Islander 0.1 n/r n/a 0.1 
Two or More Races Reported 4.7 n/r n/a 5.2 
Unknown 0.3 2.8 n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.8 Demographic Information for Grade 4 ELA/L, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 48.5 77.3 n/a 45.8 
Student with Disabilities 16.3 21.1 n/a 15.8 
English learner 12.8 13.3 n/a 12.7 
Male 50.9 51.2 n/a 50.9 
Female 49.1 48.8 n/a 49.1 
American Indian/Alaska Native 0.2 n/r n/a 0.2 
Asian 6.1 1.5 n/a 6.5 
Black/African American 36.2 66.3 n/a 33.5 
Hispanic/Latino 18.8 17.0 n/a 19.0 
White/Caucasian 33.7 12.5 n/a 35.6 
Native Hawaiian/Pacific Islander 0.1 n/r n/a 0.2 
Two or More Races Reported 4.6 n/r n/a 5.1 
Unknown 0.2 2.4 n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.9 Demographic Information for Grade 5 ELA/L, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 47.9 79.0 n/a 45.2 
Student with Disabilities 16.8 22.7 n/a 16.3 
English learner 8.9 13.4 n/a 8.6 
Male 51.1 49.6 n/a 51.2 
Female 48.9 50.4 n/a 48.8 
American Indian/Alaska Native 0.3 n/r n/a 0.3 
Asian 6.2 1.3 n/a 6.6 
Black/African American 36.7 68.0 n/a 34.0 
Hispanic/Latino 18.2 18.1 n/a 18.2 
White/Caucasian 33.7 9.9 n/a 35.8 
Native Hawaiian/Pacific Islander 0.2 n/r n/a 0.2 
Two or More Races Reported 4.5 n/r n/a 4.9 
Unknown 0.2 2.4 n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.10 Demographic Information for Grade 6 ELA/L, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 43.6 78.6 n/r 44.3 
Student with Disabilities 16.5 21.9 14.9 16.1 
English learner 6.3 9.6 8.3 5.9 
Male 51.1 49.5 51.9 51.2 
Female 48.9 50.5 48.1 48.8 
American Indian/Alaska Native 0.3 n/r 0.4 0.3 
Asian 6.2 1.2 6.2 6.6 
Black/African American 34.5 68.6 11.5 33.7 
Hispanic/Latino 18.6 17.6 21.5 18.4 
White/Caucasian 35.0 9.9 44.2 36.3 
Native Hawaiian/Pacific Islander 0.3 n/r 1.9 0.2 
Two or More Races Reported 4.9 n/r 12.8 4.6 
Unknown 0.3 2.4 1.5 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.11 Demographic Information for Grade 7 ELA/L, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 42.2 77.6 n/r 42.6 
Student with Disabilities 16.5 22.3 13.8 16.3 
English learner 5.7 6.8 74 5.5 
Male 51.2 51.1 50.0 51.3 
Female 48.8 48.9 50.0 48.7 
American Indian/Alaska Native 0.3 n/r 0.5 0.3 
Asian 6.4 1.5 7.2 6.7 
Black/African American 34.6 69.0 11.8 33.7 
Hispanic/Latino 17.9 18.2 20.6 17.7 
White/Caucasian 35.5 8.9 43.2 36.9 
Native Hawaiian/Pacific Islander 0.3 n/r 2.1 0.2 
Two or More Races Reported 47 n/r 12.8 45 
Unknown 0.2 2.2 1.7 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.12 Demographic Information for Grade 8 ELA/L, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 40.4 76.2 n/r 40.5 
Student with Disabilities 16.3 22.4 13.4 16.0 
English learner 5.6 6.7 6.7 5.4 
Male 50.7 50.3 49.2 50.8 
Female 49.3 49.7 50.8 49.2 
American Indian/Alaska Native 0.2 n/r n/r 0.2 
Asian 6.6 1.3 7.0 7.0 
Black/African American 34.5 69.7 12.0 33.4 
Hispanic/Latino 17.3 16.9 22.1 17.0 
White/Caucasian 36.3 10.0 42.3 37.8 
Native Hawaiian/Pacific Islander 0.3 n/r 2.1 0.2 
Two or More Races Reported 4.5 n/r 12.4 4.3 
Unknown 0.2 1.9 1.7 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 


New Meridian February 28, 2020 Page 209 


2019 Technical Report 


Table A.11.13 Demographic Information for Grade 9 ELA/L, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 76.6 76.6 n/a n/a 
Student with Disabilities 22.2 22.2 n/a n/a 
English learner 8.4 8.4 n/a n/a 
Male 48.8 48.8 n/a n/a 
Female 51.2 51.2 n/a n/a 
American Indian/Alaska Native n/r n/r n/a n/a 
Asian 1.4 1.4 n/a n/a 
Black/African American 69.9 69.9 n/a n/a 
Hispanic/Latino 17.1 17.1 n/a n/a 
White/Caucasian 9.7 9.7 n/a n/a 
Native Hawaiian/Pacific Islander n/r n/r n/a n/a 
Two or More Races Reported n/r n/r n/a n/a 
Unknown 1.6 1.6 n/a n/a 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.14 Demographic Information for Grade 10 ELA/L, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 38.4 78.1 n/r 37.9 
Student with Disabilities 17.1 22.1 11.2 17.0 
English learner 8.0 10.1 5.3 8.0 
Male 51.3 49.5 51.0 51.5 
Female 48.7 50.5 49.0 48.5 
American Indian/Alaska Native 0.2 n/r n/r 0.2 
Asian 6.6 2.0 8.3 6.8 
Black/African American 36.4 69.4 10.8 35.6 
Hispanic/Latino 18.0 18.4 20.4 17.8 
White/Caucasian 34.1 8.5 42.3 35.3 
Native Hawaiian/Pacific Islander 0.2 n/r 1.8 0.1 
Two or More Races Reported 4.3 n/r 13.2 4.1 
Unknown 0.2 1.2 2.8 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.15 Demographic Information for Grade 11 ELA/L, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 71.7 71.7 n/a n/r 
Student with Disabilities 43.3 43.3 n/a n/r 
English learner n/r n/r n/a n/r 
Male 58.3 58.3 n/a n/r 
Female 41.7 41.7 n/a n/r 
American Indian/Alaska Native n/r n/r n/a n/r 
Asian n/r n/r n/a n/r 
Black/African American 53.3 53.3 n/a n/r 
Hispanic/Latino 43.3 43.3 n/a n/r 
White/Caucasian n/r n/r n/a n/r 
Native Hawaiian/Pacific Islander n/r n/r n/a n/r 
Two or More Races Reported n/r n/r n/a n/r 
Unknown n/r n/r n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.16 Demographic Information for Grade 3 Mathematics, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 44.5 717.4 n/r 45.4 
Student with Disabilities 15.0 19.4 15.3 14.5 
English learner 14.5 15.2 12.2 14.6 
Male 50.9 50.0 50.7 51.0 
Female 49.1 50.0 49.3 49.0 
American Indian/Alaska Native 0.3 n/r 0.3 0.3 
Asian 6.1 1.4 6.1 6.5 
Black/African American 33.9 66.1 10.8 32.9 
Hispanic/Latino 19.1 16.7 22.0 19.0 
White/Caucasian 34.8 12.8 45.2 36.0 
Native Hawaiian/Pacific Islander 0.2 n/r 1.5 0.1 
Two or More Races Reported 5.3 n/r 12.4 5.1 
Unknown 0.4 2.8 1.7 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.17 Demographic Information for Grade 4 Mathematics, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 45.0 77.3 n/r 45.9 
Student with Disabilities 16.1 20.9 15.4 15.7 
English learner 13.3 14.1 11.5 13.4 
Male 50.9 51.3 51.0 50.9 
Female 49.1 48.7 49.0 49.1 
American Indian/Alaska Native 0.2 n/r 0.4 0.2 
Asian 6.1 1.5 5.4 6.5 
Black/African American 34.1 65.8 11.0 33.3 
Hispanic/Latino 19.4 17.6 21.7 19.4 
White/Caucasian 34.4 12.5 44.9 35.4 
Native Hawaiian/Pacific Islander 0.3 n/r 1.9 0.1 
Two or More Races Reported 5.2 n/r 13.1 5.0 
Unknown 0.3 2.4 1.5 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.18 Demographic Information for Grade 5 Mathematics, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 44.6 79.0 n/r 45.2 
Student with Disabilities 16.6 22.7 14.4 16.2 
English learner 9.6 14.0 9.3 9.3 
Male 51.0 49.7 50.2 51.1 
Female 49.0 50.3 49.8 48.9 
American Indian/Alaska Native 0.3 n/r 0.5 0.3 
Asian 6.2 1.4 5.8 6.7 
Black/African American 34.8 67.6 12.3 33.8 
Hispanic/Latino 18.8 18.5 22.1 18.6 
White/Caucasian 34.3 9.9 44.4 35.6 
Native Hawaiian/Pacific Islander 0.3 n/r 1.9 0.2 
Two or More Races Reported 5.0 n/r 12.0 4.9 
Unknown 0.3 2.4 1.1 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.19 Demographic Information for Grade 6 Mathematics, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 43.7 78.7 n/r 44.4 
Student with Disabilities 16.4 21.6 14.9 16.0 
English learner 6.9 10.1 8.2 6.6 
Male 51.1 49.4 51.9 51.1 
Female 48.9 50.6 48.1 48.9 
American Indian/Alaska Native 0.3 n/r 0.4 0.3 
Asian 6.1 1.2 6.2 6.4 
Black/African American 34.4 68.5 11.5 33.5 
Hispanic/Latino 19.0 17.9 21.4 18.8 
White/Caucasian 34.9 9.7 44.2 36.1 
Native Hawaiian/Pacific Islander 0.3 n/r 2.0 0.2 
Two or More Races Reported 4.8 n/r 12.9 4.6 
Unknown 0.3 2.4 1.5 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.20 Demographic Information for Grade 7 Mathematics, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 48.8 79.1 n/a 46.3 
Student with Disabilities 18.0 22.4 n/a 17.7 
English learner 7.0 7.5 n/a 7.0 
Male 51.0 50.8 n/a 51.0 
Female 49.0 49.2 n/a 49.0 
American Indian/Alaska Native 0.3 n/r n/a 0.3 
Asian 4.2 1.4 n/a 4.5 
Black/African American 38.6 69.5 n/a 36.0 
Hispanic/Latino 19.4 18.8 n/a 19.4 
White/Caucasian 33.3 8.1 n/a 35.4 
Native Hawaiian/Pacific Islander 0.2 n/r n/a 0.2 
Two or More Races Reported 3.9 n/r n/a 4.3 
Unknown 0.2 2.0 n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.21 Demographic Information for Grade 8 Mathematics, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 55.4 83.3 n/a 52.5 
Student with Disabilities 22.2 25.7 n/a 21.8 
English learner 8.9 8.5 n/a 8.9 
Male 52.5 51.6 n/a 52.6 
Female 47.5 48.4 n/a 47.4 
American Indian/Alaska Native 0.2 n/r n/a 0.3 
Asian 2.5 1.1 n/a 2.6 
Black/African American 45.4 75.9 n/a 42.2 
Hispanic/Latino 19.4 16.6 n/a 19.6 
White/Caucasian 28.8 5.1 n/a 31.2 
Native Hawaiian/Pacific Islander 0.2 n/r n/a 0.2 
Two or More Races Reported 3.5 n/r n/a 3.8 
Unknown 0.1 1.2 n/a n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.22 Demographic Information for Algebra I, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 40.4 77.3 n/r 40.7 
Student with Disabilities 17.6 21.1 11.8 17.6 
English learner 8.6 9.9 6.2 8.6 
Male 51.4 48.9 51.3 51.5 
Female 48.6 51.1 48.7 48.5 
American Indian/Alaska Native 0.2 n/r n/r 0.2 
Asian 6.4 1.3 8.0 6.5 
Black/African American 37.0 68.6 11.0 36.8 
Hispanic/Latino 19.1 18.5 19.7 19.1 
White/Caucasian 32.6 9.5 43.2 33.1 
Native Hawaiian/Pacific Islander 0.2 n/r 2.1 0.1 
Two or More Races Reported 4.3 n/r 13.1 4.1 
Unknown 0.2 1.9 2.4 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 


and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.23 Demographic Information for Geometry, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 29.5 80.4 n/r 13.5 
Student with Disabilities 11.4 21.7 10.8 5.6 
English learner 44 10.7 5.7 n/r 
Male 51.5 50.3 51.3 52.3 
Female 48.5 49.7 48.7 47.7 
American Indian/Alaska Native 0.2 n/r n/r n/r 
Asian 15.3 1.8 9.3 25.9 
Black/African American 29.4 71.2 10.5 13.7 
Hispanic/Latino 12.9 18.0 21.0 6.2 
White/Caucasian 34.9 74 40.7 48.3 
Native Hawaiian/Pacific Islander 0.6 n/r 2.0 n/r 
Two or More Races Reported 5.6 n/r 13.0 5.5 
Unknown 1.1 1.2 3.2 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.24 Demographic Information for Algebra II, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 46 31.7 n/r 7.6 
Student with Disabilities 7.2 19.0 8.5 4.5 
English learner 2.7 9.8 4.3 n/r 
Male 49.8 50.2 47.9 52.1 
Female 50.2 49.8 52.1 47.9 
American Indian/Alaska Native n/r n/r n/r n/r 
Asian 16.8 n/r 8.7 27.7 
Black/African American 11.4 31.2 11.6 9.3 
Hispanic/Latino 13.5 16.6 20.8 44 
White/Caucasian 46.3 43.9 41.5 52.4 
Native Hawaiian/Pacific Islander 1.4 n/r 2.6 n/r 
Two or More Races Reported 8.9 n/r 12.1 5.9 
Unknown 1.4 n/r 2.3 n/r 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Table A.11.25 Demographic Information for Integrated Mathematics I, Overall and by State 


Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 68.1 68.1 n/a n/a 
Student with Disabilities 23.6 23.6 n/a n/a 
English learner n/r n/r n/a n/a 
Male 41.7 41.7 n/a n/a 
Female 58.3 58.3 n/a n/a 
American Indian/Alaska Native n/r n/r n/a n/a 
Asian n/r n/r n/a n/a 
Black/African American 50.0 50.0 n/a n/a 
Hispanic/Latino 37.5 37.5 n/a n/a 
White/Caucasian n/r n/r n/a n/a 
Native Hawaiian/Pacific Islander n/r n/r n/a n/a 
Two or More Races Reported n/r n/r n/a n/a 
Unknown n/r n/r n/a n/a 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 


New Meridian February 28, 2020 Page 222 


Table A.11.26 Demographic Information for Integrated Mathematics II, Overall and by State 
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Demographic All States (%) DC (%) DD (%) MD (%) 
Economically Disadvantaged 43.2 43.2 n/a n/a 
Student with Disabilities 19.5 19.5 n/a n/a 
English learner n/r n/r n/a n/a 
Male 53.7 53.7 n/a n/a 
Female 46.3 46.3 n/a n/a 
American Indian/Alaska Native n/r n/r n/a n/a 
Asian n/r n/r n/a n/a 
Black/African American 29.5 29.5 n/a n/a 
Hispanic/Latino 32.1 32.1 n/a n/a 
White/Caucasian 25.3 25.3 n/a n/a 
Native Hawaiian/Pacific Islander n/r n/r n/a n/a 
Two or More Races Reported n/r n/r n/a n/a 
Unknown n/r n/r n/a n/a 


Note: All States=data from all participating states combined; DC=District of Columbia, DD=Department of Defense Education Activity, 
and MD=Maryland n/a=not applicable; n/r=not reported due to n<20. 
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Appendix 12.1: Form Composition 


Table A.12.1 Form Composition for ELA/L Grade 3 


2019 Technical Report 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


Note: This table is identical to Table 12.1 in Section 12. 


Table A.12.2 Form Composition for ELA/L Grade 4 


Number of Items 


6-12 

6-12 

4-7 
20 


23 


Number of Points 


15 - 27 

15-27 
8-14 
46 


36 
82 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


New Meridian 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


February 28, 2020 


Number of Points 


16 - 40 

16 - 40 

8-18 
64 
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Table A.12.3 Form Composition for ELA/L Grade 5 
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Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


Table A.12.4 Form Composition for ELA/L Grade 6 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


Number of Points 


16 - 42 

16 - 42 

8-18 
64 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


Table A.12.5 Form Composition for ELA/L Grade 7 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


Number of Points 


16 - 42 

16 - 42 

8-18 
64 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


New Meridian 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


February 28, 2020 


Number of Points 


16 - 42 

16-42 
8-18 
64 
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Table A.12.6 Form Composition for ELA/L Grade 8 


2019 Technical Report 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


Table A.12.7 Form Composition for ELA/L Grade 9 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


Number of Points 


16 - 42 

16 - 42 

8-18 
64 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


Table A.12.8 Form Composition for ELA/L Grade 10 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


Number of Points 


16 - 42 

16 - 42 

8-18 
64 


Claims Subclaims 

Reading 
Reading Literary Text 
Reading Informational Text 


Vocabulary 
Claim Total 

Writing 
Written Expression 
Knowledge of Conventions 
Claim Total 

SUMMATIVE TOTAL 


New Meridian 


Number of Items 


6-19 

6-19 

4-9 
28 


31 


February 28, 2020 


Number of Points 


16 - 42 

16 - 42 

8-18 
64 


36 


9 
45 
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Table A.12.9 Form Composition for ELA/L Grade 11 
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Claims Subclaims Number of Items Number of Points 
Reading 
Reading Literary Text 6-19 16-42 
Reading Informational Text 6-19 16 - 42 
Vocabulary 4-9 8-18 
Claim Total 28 64 
Writing 
Written Expression 2 36 
Knowledge of Conventions 1 9 
Claim Total 3 45 
SUMMATIVE TOTAL 31 109 


Table A.12.10 Form Composition for Mathematics Grade 3 


Subclaims 
Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 


Modeling and Applications 
TOTAL 


Number of Items 


26 


Number of Points 


30 
10 
14 
12 
66 


Note: This table is identical to Table 12.3 in Section 12. 


Table A.12.11 Form Composition for Mathematics Grade 4 


Subclaims 

Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 
Modeling and Applications 

TOTAL 


Number of Items 


25 


40 


Number of Points 
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Table A.12.12 Form Composition for Mathematics Grade 5 


2019 Technical Report 


Subclaims 

Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 
Modeling and Applications 

TOTAL 


Table A.12.13 Form Composition for Mathematics Grade 6 


Number of Items 


25 


40 


Number of Points 


30 
10 
14 
12 
66 


Subclaims 

Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 
Modeling and Applications 

TOTAL 


Number of Items 


20 
11 


38 


Number of Points 


26 
14 
14 
12 
66 


Table A.12.14 Form Composition for Mathematics Grade 7 


Subclaims 
Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 


Modeling and Applications 
TOTAL 


Number of Items 


23 


38 


Number of Points 


29 
11 
14 
12 
66 


Table A.12.15 Form Composition for Mathematics Grade 8 


Subclaims 

Mathematics 
Major Content 
Additional & Supporting Content 
Expressing Mathematical Reasoning 
Modeling and Applications 

TOTAL 
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Number of Items 


21 


36 


Number of Points 


27 
13 
14 
12 
66 
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Table A.12.16 Form Composition for Algebra | 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 21 28 

Additional & Supporting Content 13 21 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL 42 81 


Table A.12.17 Form Composition for Geometry 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 21 30 

Additional & Supporting Content 14 19 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL 43 81 


Table A.12.18 Form Composition for Algebra II 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 20 29 

Additional & Supporting Content 13 20 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL A 81 


Table A.12.19 Form Composition for Integrated Mathematics I 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 21 31 

Additional & Supporting Content 13 18 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL 42 81 
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Table A.12.20 Form Composition for Integrated Mathematics II 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 22 32 

Additional & Supporting Content 12 17 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL 42 81 


Table A.12.21 Form Composition for Integrated Mathematics III 


Subclaims Number of Items Number of Points 
Mathematics 

Major Content 19 26 

Additional & Supporting Content 13 23 

Expressing Mathematical Reasoning 4 14 

Modeling and Applications 4 18 
TOTAL 40 81 
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Appendix 12.2: Threshold Scores and Scaling Constants 


Table A.12.22 Threshold Scores and Scaling Constants for ELA/L Grades 3 to 8 


Assessment Threshold Cut Theta Scale Score A B 
Level 2 Cut -0.9648 700 
Level 3 Cut -0.2840 726 
Grade 3 ELA ——_—7r 36.7227 = 735.4297 
Level 4 Cut 0.3968 750 
Level 5 Cut 2.0360 810 
Level 2 Cut -1.3004 700 
Level 3 Cut -0.5079 725 
Grade 4 ELA s—)] Na _ 31.5462 389297 41.0214 
Level 4 Cut 0.2846 750 
Level 5 Cut 1.5578 790 
Level 2 Cut -1.3411 700 
Level 3 Cut -0.4924 726 
Grade 5 ELA ———_ 29.4580 739.5050 
Level 4 Cut 0.3563 750 
Level 5 Cut 2.0224 799 
Level 2 Cut -1.3656 700 
Level 3 Cut -0.4827 725 
Grade 6 ELA ———$—$S— | — 28.3160 738.6673 
Level 4 Cut 0.4002 750 
Level 5 Cut 1.8133 790 
Level 2 Cut -1.2488 700 
Level 3 Cut -0.5117 725 
Grade 7 ELA ——— ——_§|_| i — 33.9161 742.3542 
Level 4 Cut 0.2254 750 
Level 5 Cut 1.2614 785 
Level 2 Cut -1.2730 700 
Level 3 Cut -0.5402 725 
Grade 8 ELA ——_  ——. 34.1183 743.4330 
Level 4 Cut 0.1925 750 
Level 5 Cut 1.4696 794 
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Table A.12.23 Threshold Scores and Scaling Constants for Mathematics Grades 3 to 8 


Assessment Threshold Cut Theta Scale Score A B 

Level 2 Cut -1.4141 700 

Level 3 Cut -0.6356 727 
Grae 32.1135 745.4119 
Mathematics Level 4 Cut 0.1429 750 

Level 5 Cut 1.3931 790 

Level 2 Cut -1.3840 700 

Level 3 Cut -0.5484 727 
nee 29.9167 741.4049 
Mathematics Level 4 Cut 0.2873 750 

Level 5 Cut 1.8323 796 

Level 2 Cut -1.4571 700 

Level 3 Cut -0.5959 725 
Crags 29.0301 742.2997 
Mathematics Level 4 Cut 0.2653 750 

Level 5 Cut 1.6262 790 

Level 2 Cut -1.3829 700 

Level 3 Cut -0.4948 725 
Clade. ee 28.1465 738.9252 
Mathematics Level 4 Cut 0.3935 750 

Level 5 Cut 1.7567 788 

Level 2 Cut -1.4464 700 

Level 3 Cut -0.4505 725 
clade — 25.1033 736.3102 
Mathematics Level 4 Cut 0.5453 750 

Level 5 Cut 1.9919 786 

Level 2 Cut -0.8851 700 

Level 3 Cut -0.1264 728 
Cre 32.9505 729.1640 
Mathematics Level 4 Cut 0.6323 750 

Level 5 Cut 2.1896 801 
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Table A.12.24 Threshold Scores and Scaling Constants for High School ELA/L 


Assessment Threshold Cut Theta Scale Score A B 

Level 2 Cut -1.1635 700 

Level 3 Cut -0.4329 726 
ae a CFT DA 
ELA/L Level 4 Cut 0.2977 750 

Level 5 Cut 1.5065 791 

Level 2 Cut -0.8909 700 

Level 3 Cut -0.3112 725 
Grade 10 ee, 
ELA/L Level 4 Cut 0.2684 750 

Level 5 Cut 1.2858 794 

Level 2 Cut “1:1017 700 

Level 3 Cut -0.3859 726 
Grade: 11 ce ___ sa 
ELA/L Level 4 Cut 0.3298 750 

Level 5 Cut 1.5206 792 
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Table A.12.25 Threshold Scores and Scaling Constants for High School Mathematics 


Assessment oo Theta Scale Score A B 
Level 2 Cut -1.1781 700 
Level 3 Cut -0.3853 728 
Algebra | 31.5325 737.1490 
Level 4 Cut 0.4075 750 
Level 5 Cut 2.1651 805 
Level 2 Cut -0.5759 700 
Level 3 Cut 0.0860 726 
Algebra II 37.7676 721.7509 
Level 4 Cut 0.7480 750 
Level 5 Cut 2.2128 808 
Level 2 Cut -1.3013 700 
Level 3 Cut -0.3389 726 
Geometry 25.9775 733.8039 
Level 4 Cut 0.6235 750 
Level 5 Cut 1.8940 783 
Level 2 Cut -1.0919 700 
Integrated = “| evel 3 Cut -0.3107 726 
Mathematics 32.0043 734.9446 
Level 4 Cut 0.4704 750 
Level 5 Cut 1.9934 799 
Level 2 Cut -0.9175 700 
Integrated = "| evel 3 Cut -0.0638 725 
Mathematics 29.2865 726.8695 
r Level 4 Cut 0.7898 750 
Level 5 Cut 1.9817 785 
Level 2 Cut -0.7076 700 
Integrated “| evel 3 Cut -0.0384 726 
Mathematics 37.3549 726.4336 
tll Level 4 Cut 0.6309 750 
Level 5 Cut 2.0689 804 
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Table A.12.26 Scaling Constants for Reading and Writing Grades 3 to 11 
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Reading Writing 
AR BR AW BW 
Grade 3 ELA/L 14.6891 44.1719 7.3445 32.0859 
Grade 4 ELA/L 12.6184 46.4086 6.3093 33.2043 
Grade 5 ELA/L 11.7832 45.8019 5.8916 32.9010 
Grade 6 ELA/L 11.3264 45.4669 5.6632 32.7335 
Grade 7 ELA/L 13.5664 46.9416 6.7832 33.4708 
Grade 8 ELA/L 13.6472 47.3732 6.8237 33.6866 
Grade 9 ELA/L 13.6870 45.9250 6.8435 32.9625 
Grade 10 ELA/L 17.2512 45.3690 8.6256 32.6845 
Grade 11 ELA/L 13.9712 45.3920 6.9856 32.6961 
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Appendix 12.3: IRT Test Characteristic Curves, Information Curves, and CSEM Curves 


ELA/L Grade 3 
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Figure A.12.1 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 3 
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ELA/L Grade 4 
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Figure A.12.2 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 4 
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ELA/L Grade 5 
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Figure A.12.3 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 5 
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ELA/L Grade 6 
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Figure A.12.4 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 6 


New Meridian February 28, 2020 Page 239 


2019 Technical Report 


ELA/L Grade 7 
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Figure A.12.5 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 7 
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ELA/L Grade 8 
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Figure A.12.6 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 8 
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ELA/L Grade 9 
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Figure A.12.7 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 9 
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ELA/L Grade 10 
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Figure A.12.8 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 10 


New Meridian February 28, 2020 Page 243 


2019 Technical Report 


ELA/L Grade 11 
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Figure A.12.9 IRT Test Characteristic Curves, Information Curves, and CSEM Curves ELA/L Grade 11 


New Meridian February 28, 2020 Page 244 


2019 Technical Report 


Mathematics Grade 3 
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Figure A.12.10 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 3 
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Mathematics Grade 4 
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Figure A.12.11 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 4 
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Mathematics Grade 5 
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Figure A.12.12 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 5 
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Mathematics Grade 6 
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Figure A.12.13 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 6 
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Mathematics Grade 7 
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Figure A.12.14 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 7 
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Mathematics Grade 8 
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Figure A.12.15 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Mathematics Grade 8 


New Meridian February 28, 2020 Page 250 


2019 Technical Report 


Algebra | 
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Figure A.12.16 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Algebra I 
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Figure A.12.17 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Geometry 
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Algebra Il 
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Figure A.12.18 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Algebra II 
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Integrated Mathematics | 
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Figure A.12.19 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Integrated Mathematics | 
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Integrated Mathematics Il 
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Figure A.12.20 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Integrated Mathematics II 
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Integrated Mathematics Ill 
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Figure A.12.21 IRT Test Characteristic Curves, Information Curves, and CSEM Curves Integrated Mathematics III 
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Appendix 12.4: Scale Score Cumulative Frequencies 
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Table A.12.27 Scale Score Cumulative Frequencies: ELA/L Grade 3 


Score Band Count Percent Cuminlallve CunnISuve 
Count Percent 

650-654 1,459 2.01 1,459 2.01 
655-659 980 1.35 2,439 3.37 
660-664 663 0.92 3,102 4.28 
665-669 1,137 1.57 4,239 5.85 
670-674 779 1.08 5,018 6.93 
675-679 2,216 3.06 7,234 9.99 
680-684 1,524 2.10 8,758 12.09 
685-689 2,306 3.18 11,064 15.27 
690-694 2,242 3.09 13,306 18.37 
695-699 2,155 2.97 15,461 21.34 
700-704 2,053 2.83 17,514 24.18 
705-709 2,629 3.63 20,143 27.81 
710-714 2,797 3.86 22,940 31.67 
715-719 2,770 3.82 25,710 35.49 
720-724 2,715 3.75 28,425 39.24 
725-729 2,669 3.68 31,094 42.92 
730-734 2,778 3.83 33,872 46.76 
735-739 2,764 3.82 36,636 50.57 
740-744 2,862 3.95 39,498 54.52 
745-749 3,615 4.99 43,113 59.51 
750-754 3,589 4.95 46,702 64.47 
755-759 2,980 4.11 49,682 68.58 
760-764 2,898 4.00 52,580 72.58 
765-769 2,992 4.13 55,572 76.71 
770-774 2,712 3.74 58,284 80.46 
775-779 2,495 3.44 60,779 83.90 
780-784 1,730 2.39 62,509 86.29 
785-789 1,994 2.75 64,503 89.04 
790-794 1,630 2.25 66,133 91.29 
795-799 1,382 1.91 67,515 93.2 
800-804 1,196 1.65 68,711 94.85 
805-809 503 0.69 69,214 95.54 
810-814 829 1.14 70,043 96.69 
815-819 667 0.92 70,710 97.61 
820-824 263 0.36 70,973 97.97 
825-829 468 0.65 71,441 98.62 
830-834 180 0.25 71,621 98.87 
835-839 289 0.40 71,910 99.27 
840-844 98 0.14 72,008 99.40 
845-850 434 0.60 72,442 100 


New Meridian February 28, 2020 Page 258 


2019 Technical Report 


Table A.12.28 Scale Score Cumulative Frequencies: ELA/L Grade 4 


Score Band Count Percent Cumnlauve ComUIBINE 
Count Percent 

650-654 457 0.62 457 0.62 
655-659 299 0.40 756 1.02 
660-664 320 0.43 1,076 1.45 
665-669 637 0.86 1,713 2.31 
670-674 812 1.09 2,525 3.40 
675-679 1,364 1.84 3,889 5.24 
680-684 1,635 2.20 5,524 7.45 
685-689 1,321 1.78 6,845 9.23 
690-694 2,205 2.97 9,050 12.20 
695-699 2,181 2.94 11,231 15.14 
700-704 2,198 2.96 13,429 18.11 
705-709 2,197 2.96 15,626 21.07 
710-714 2,796 3.77 18,422 24.84 
715-719 2,340 3.16 20,762 27.99 
720-724 3,566 4.81 24,328 32.80 
725-729 3,662 4.94 27,990 37.74 
730-734 3,204 4.32 31,194 42.06 
735-739 3,874 5.22 35,068 47.28 
740-744 3,960 5.34 39,028 52.62 
745-749 3,123 4.21 42,151 56.83 
750-754 3,817 5.15 45,968 61.98 
755-759 3,137 4.23 49,105 66.21 
760-764 3,623 4.88 52,728 71.09 
765-769 3,403 4.59 56,131 75.68 
770-774 2,639 3.56 58,770 79.24 
775-779 2,803 3.78 61,573 83.02 
780-784 2,670 3.60 64,243 86.62 
785-789 1,556 2.10 65,799 88.72 
790-794 2,031 2.74 67,830 91.46 
795-799 1,540 2.08 69,370 93.53 
800-804 1,029 1.39 70,399 94.92 
805-809 1,153 1.55 71,552 96.47 
810-814 811 1.09 72,363 97.57 
815-819 491 0.66 72,854 98.23 
820-824 363 0.49 73,217 98.72 
825-829 286 0.39 73,503 99.10 
830-834 264 0.36 73,767 99.46 
835-839 106 0.14 73,873 99.60 
840-844 107 0.14 73,980 99.75 
845-850 187 0.25 74,167 100 
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Table A.12.29 Scale Score Cumulative Frequencies: ELA/L Grade 5 


Score Band Count Percent Cuminlallve ComMIAINE 
Count Percent 

650-654 307 0.41 307 0.41 
655-659 123 0.16 430 0.57 
660-664 403 0.53 833 1.10 
665-669 248 0.33 1,081 1.43 
670-674 376 0.50 1,457 1.93 
675-679 1,668 2.21 3,125 4.14 
680-684 1,196 1.59 4,321 5.73 
685-689 1,484 1.97 5,805 7.70 
690-694 2,410 3.20 8,215 10.89 
695-699 2,413 3.20 10,628 14.09 
700-704 2,283 3.03 12,911 17.12 
705-709 2,314 3.07 15,225 20.19 
710-714 3,293 4.37 18,518 24.55 
715-719 2,707 3.59 21,225 28.14 
720-724 3,316 4.40 24,541 32.54 
725-729 3,419 4.53 27,960 37.07 
730-734 3,356 4.45 31,316 41.52 
735-739 4,084 5.41 35,400 46.94 
740-744 3,518 4.66 38,918 51.6 
745-749 3,867 5.13 42,785 56.73 
750-754 3,683 4.88 46,468 61.61 
755-759 4,046 5.36 50,514 66.98 
760-764 3,418 4.53 53,932 71.51 
765-769 3,225 4.28 57,157 75.78 
770-774 3,112 4.13 60,269 79.91 
775-779 2,925 3.88 63,194 83.79 
780-784 2,672 3.54 65,866 87.33 
785-789 2,460 3.26 68,326 90.59 
790-794 1,992 2.64 70,318 93.23 
795-799 1,170 1.55 71,488 94.79 
800-804 1,369 1.82 72,857 96.60 
805-809 857 1.14 73,714 97.74 
810-814 558 0.74 74,272 98.48 
815-819 439 0.58 74,711 99.06 
820-824 236 0.31 74,947 99.37 
825-829 165 0.22 75,112 99.59 
830-834 131 0.17 75,243 99.76 
835-839 56 0.07 75,299 99.84 
840-844 49 0.06 75,348 99.9 
845-850 73 0.10 75,421 100 
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Table A.12.30 Scale Score Cumulative Frequencies: ELA/L Grade 6 


Score Band Count Percent Cumblalive CuIUlatve 
Count Percent 

650-654 118 0.15 118 0.15 
655-659 123 0.16 241 0.31 
660-664 155 0.20 396 0.50 
665-669 223 0.28 619 0.79 
670-674 507 0.64 1,126 1.43 
675-679 702 0.89 1,828 2.32 
680-684 1,894 2.40 3,722 4.73 
685-689 1,168 1.48 4,890 6.21 
690-694 2,477 3.15 7,367 9.35 
695-699 2,384 3.03 9,751 12.38 
700-704 2,319 2.94 12,070 15.33 
705-709 2,696 3.42 14,766 18.75 
710-714 3,162 4.01 17,928 22.76 
715-719 3,545 4.50 21,473 27.27 
720-724 3,064 3.89 24,537 31.16 
725-729 4,331 5.50 28,868 36.66 
730-734 4,257 5.41 33,125 42.06 
735-739 4,327 5.49 37,452 47.56 
740-744 4,323 5.49 41,775 53.04 
745-749 4,484 5.69 46,259 58.74 
750-754 4,969 6.31 51,228 65.05 
755-759 3,946 5.01 55,174 70.06 
760-764 4,372 5.55 59,546 75.61 
765-769 3,084 3.92 62,630 79.53 
770-774 3,789 4.81 66,419 84.34 
775-779 2,624 3.33 69,043 87.67 
780-784 2,320 2.95 71,363 90.61 
785-789 2,002 2.54 73,365 93.16 
790-794 1,598 2.03 74,963 95.19 
795-799 1,203 1.53 76,166 96.71 
800-804 812 1.03 76,978 97.74 
805-809 559 0.71 77,537 98.45 
810-814 485 0.62 78,022 99.07 
815-819 232 0.29 78,254 99.36 
820-824 175 0.22 78,429 99.59 
825-829 84 0.11 78,513 99.69 
830-834 92 0.12 78,605 99.81 
835-839 40 0.05 78,645 99.86 
840-844 37 0.05 78,682 99.91 
845-850 73 0.09 78,755 100 
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Table A.12.31 Scale Score Cumulative Frequencies: ELA/L Grade 7 


Score Band Count Percent Cumplallve CoIUlaLve 
Count Percent 

650-654 664 0.88 664 0.88 
655-659 245 0.33 909 1.21 
660-664 315 0.42 1,224 1.63 
665-669 644 0.86 1,868 2.49 
670-674 1,136 1.51 3,004 4.00 
675-679 1,316 1.75 4,320 5.75 
680-684 938 1.25 5,258 7.00 
685-689 1,929 2.57 7,187 9.57 
690-694 1,302 1.73 8,489 11.31 
695-699 1,855 2.47 10,344 13.78 
700-704 2,164 2.88 12,508 16.66 
705-709 1,811 2.41 14,319 19.07 
710-714 2,657 3.54 16,976 22.61 
715-719 2,754 3.67 19,730 26.28 
720-724 2,873 3.83 22,603 30.10 
725-729 2,852 3.80 25,455 33.90 
730-734 3,382 4.50 28,837 38.41 
735-739 3,352 4.46 32,189 42.87 
740-744 3,402 4.53 35,591 47.40 
745-749 3,491 4.65 39,082 52.05 
750-754 4,123 5.49 43,205 57.54 
755-759 3,419 4.55 46,624 62.10 
760-764 3,893 5.18 50,517 67.28 
765-769 3,414 4.55 53,931 71.83 
770-774 3,421 4.56 57,352 76.38 
775-779 3,103 4.13 60,455 80.52 
780-784 2,500 3.33 62,955 83.85 
785-789 2,320 3.09 65,275 86.94 
790-794 2,015 2.68 67,290 89.62 
795-799 1,762 2.35 69,052 91.97 
800-804 1,319 1.76 70,371 93.72 
805-809 1,143 1.52 71,514 95.25 
810-814 944 1.26 72,458 96.50 
815-819 625 0.83 73,083 97.33 
820-824 521 0.69 73,604 98.03 
825-829 424 0.56 74,028 98.59 
830-834 299 0.40 74,327 98.99 
835-839 195 0.26 74,522 99.25 
840-844 141 0.19 74,663 99.44 
845-850 421 0.56 75,084 100 
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Table A.12.32 Scale Score Cumulative Frequencies: ELA/L Grade 8 


Score Band Count Percent Cumblalve CoIUlaLve 
Count Percent 

650-654 664 0.88 664 0.88 
655-659 245 0.33 909 1.21 
660-664 315 0.42 1,224 1.63 
665-669 644 0.86 1,868 2.49 
670-674 1,136 1.51 3,004 4.00 
675-679 1,316 1.75 4,320 5.75 
680-684 938 1.25 5,258 7.00 
685-689 1,929 2.57 7,187 9.57 
690-694 1,302 1.73 8,489 11.31 
695-699 1,855 2.47 10,344 13.78 
700-704 2,164 2.88 12,508 16.66 
705-709 1,811 2.41 14,319 19.07 
710-714 2,657 3.54 16,976 22.61 
715-719 2,754 3.67 19,730 26.28 
720-724 2,873 3.83 22,603 30.10 
725-729 2,852 3.80 25,455 33.90 
730-734 3,382 4.50 28,837 38.41 
735-739 3,352 4.46 32,189 42.87 
740-744 3,402 4.53 35,591 47.40 
745-749 3,491 4.65 39,082 52.05 
750-754 4,123 5.49 43,205 57.54 
755-759 3,419 4.55 46,624 62.10 
760-764 3,893 5.18 50,517 67.28 
765-769 3,414 4.55 53,931 71.83 
770-774 3,421 4.56 57,352 76.38 
775-779 3,103 4.13 60,455 80.52 
780-784 2,500 3.33 62,955 83.85 
785-789 2,320 3.09 65,275 86.94 
790-794 2,015 2.68 67,290 89.62 
795-799 1,762 2.35 69,052 91.97 
800-804 1,319 1.76 70,371 93.72 
805-809 1,143 1.52 71,514 95.25 
810-814 944 1.26 72,458 96.50 
815-819 625 0.83 73,083 97.33 
820-824 521 0.69 73,604 98.03 
825-829 424 0.56 74,028 98.59 
830-834 299 0.4 74,327 98.99 
835-839 195 0.26 74,522 99.25 
840-844 141 0.19 74,663 99.44 
845-850 421 0.56 75,084 100 
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Table A.12.33 Scale Score Cumulative Frequencies: ELA/L Grade 9 


Score Band Count Percent Cumblalve COMMIAINE 
Count Percent 

650-654 46 1.36 46 1.36 
655-659 21 0.62 67 1.98 
660-664 58 1.71 125 3.69 
665-669 64 1.89 189 5.58 
670-674 7 0.21 196 5.79 
675-679 88 2.60 284 8.38 
680-684 102 3.01 386 11.39 
685-689 156 4.60 542 16.00 
690-694 89 2.63 631 18.62 
695-699 168 4.96 799 23.58 
700-704 144 4.25 943 27.83 
705-709 103 3.04 1,046 30.87 
710-714 148 4.37 1,194 35.24 
715-719 129 3.81 1,323 39.05 
720-724 173 5.11 1,496 44.16 
725-729 152 4.49 1,648 48.64 
730-734 145 4.28 1,793 52.92 
735-739 171 5.05 1,964 57.97 
740-744 162 4.78 2,126 62.75 
745-749 115 3.39 2,241 66.15 
750-754 144 4.25 2,385 70.40 
755-759 146 4.31 2,531 74.70 
760-764 91 2.69 2,622 77.39 
765-769 109 3.22 2,731 80.61 
770-774 103 3.04 2,834 83.65 
775-779 70 2.07 2,904 85.71 
780-784 72 2.13 2,976 87.84 
785-789 97 2.86 3,073 90.7 
790-794 55 1.62 3,128 92.33 
795-799 67 1.98 3,195 94.3 
800-804 43 1.27 3,238 95.57 
805-809 32 0.94 3,270 96.52 
810-814 32 0.94 3,302 97.46 
815-819 19 0.56 3,321 98.02 
820-824 28 0.83 3,349 98.85 
825-829 12 0.35 3,361 99.20 
830-834 12 0.35 3,373 99.56 
835-839 5 0.15 3,378 99.70 
840-844 6 0.18 3,384 99.88 
845-850 4 0.12 3,388 100 
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Table A.12.34 Scale Score Cumulative Frequencies: ELA/L Grade 10 


Score Band Count Percent Cumnlalve COMMIALNE 
Count Percent 

650-654 2,361 3.21 2,361 3.21 
655-659 609 0.83 2,970 4.04 
660-664 1,007 1.37 3,977 5.41 
665-669 1,053 1.43 5,030 6.84 
670-674 464 0.63 5,494 7.48 
675-679 1,687 2.30 7,181 9.77 
680-684 1,376 1.87 8,557 11.64 
685-689 969 1.32 9,526 12.96 
690-694 1,827 2.49 11,353 15.45 
695-699 1,352 1.84 12,705 17.29 
700-704 1,751 2.38 14,456 19.67 
705-709 1,831 2.49 16,287 22.16 
710-714 1,826 2.48 18,113 24.65 
715-719 2,300 3.13 20,413 27.78 
720-724 2,145 2.92 22,558 30.7 
725-729 2,766 3.76 25,324 34.46 
730-734 1,818 2.47 27,142 36.93 
735-739 2,849 3.88 29,991 40.81 
740-744 2,869 3.90 32,860 44.72 
745-749 2,888 3.93 35,748 48.65 
750-754 2,945 4.01 38,693 52.65 
755-759 3,025 4.12 41,718 56.77 
760-764 2,776 3.78 44,494 60.55 
765-769 2,949 4.01 47,443 64.56 
770-774 2,844 3.87 50,287 68.43 
775-779 2,842 3.87 53,129 72.30 
780-784 2,337 3.18 55,466 75.48 
785-789 2,573 3.50 58,039 78.98 
790-794 2,456 3.34 60,495 82.32 
795-799 1,576 2.14 62,071 84.47 
800-804 2,161 2.94 64,232 87.41 
805-809 1,625 2.21 65,857 89.62 
810-814 1,205 1.64 67,062 91.26 
815-819 1,292 1.76 68,354 93.02 
820-824 1,213 1.65 69,567 94.67 
825-829 850 1.16 70,417 95.82 
830-834 722 0.98 71,139 96.80 
835-839 471 0.64 71,610 97.45 
840-844 390 0.53 72,000 97.98 
845-850 1,487 2.02 73,487 100 
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Table A.12.35 Scale Score Cumulative Frequencies: ELA/L Grade 11 


Score Band Count Percent Cumblaive CUIIBtve 
Count Percent 

650-654 0 0 0 0 
655-659 1 1.67 1 1.67 
660-664 0 0 1 1.67 
665-669 1 1.67 2 3.33 
670-674 3 5.00 5 8.33 
675-679 2 3.33 7 11.67 
680-684 4 6.67 11 18.33 
685-689 4 6.67 15 25.00 
690-694 3 5.00 18 30.00 
695-699 4 6.67 22 36.67 
700-704 7 11.67 29 48.33 
705-709 6 10.00 35 58.33 
710-714 5 8.33 40 66.67 
715-719 1 1.67 A 68.33 
720-724 2 3.33 43 71.67 
725-729 2 3.33 45 75.00 
730-734 1 1.67 46 76.67 
735-739 6 10.00 52 86.67 
740-744 1 1.67 53 88.33 
745-749 1 1.67 54 90.00 
750-754 0 0 54 90.00 
755-759 2 3.33 56 93.33 
760-764 0 0 56 93.33 
765-769 2 3.33 58 96.67 
770-774 0 0 58 96.67 
775-779 0 0 58 96.67 
780-784 0 0 58 96.67 
785-789 1 1.67 59 98.33 
790-794 0 0 59 98.33 
795-799 0 0 59 98.33 
800-804 0 0 59 98.33 
805-809 0 59 98.33 
810-814 1 1.67 60 100 
815-819 0 0 60 100 
820-824 0 0 60 100 
825-829 0 0 60 100 
830-834 0 0 60 100 
835-839 0 0 60 100 
840-844 0 0 60 100 
845-850 0 0 60 100 
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Table A.12.36 Scale Score Cumulative Frequencies: Mathematics Grade 3 


Score Band Count Percent Curiblallve CumUlatve 
Count Percent 

650-654 822 1.04 822 1.04 
655-659 677 0.86 1,499 1.90 
660-664 464 0.59 1,963 2.48 
665-669 507 0.64 2,470 3.12 
670-674 1,024 1.30 3,494 4.42 
675-679 1,124 1.42 4,618 5.84 
680-684 1,296 1.64 5,914 7.48 
685-689 1,365 1.73 7,279 9.21 
690-694 2,113 2.67 9,392 11.88 
695-699 2,118 2.68 11,510 14.56 
700-704 2,311 2.92 13,821 17.48 
705-709 2,300 2.91 16,121 20.39 
710-714 2,360 2.98 18,481 23.37 
715-719 3,202 4.05 21,683 27.42 
720-724 3,195 4.04 24,878 31.46 
725-729 3,983 5.04 28,861 36.5 
730-734 4,127 5.22 32,988 41.72 
735-739 4,066 5.14 37,054 46.86 
740-744 3,178 4.02 40,232 50.88 
745-749 4,720 5.97 44,952 56.85 
750-754 3,241 4.10 48,193 60.95 
755-759 4,634 5.86 52,827 66.81 
760-764 2,981 3.77 55,808 70.58 
765-769 3,834 4.85 59,642 75.43 
770-774 3,451 4.36 63,093 79.79 
775-779 2,837 3.59 65,930 83.38 
780-784 2,724 3.45 68,654 86.83 
785-789 2,510 3.17 71,164 90.00 
790-794 1,770 2.24 72,934 92.24 
795-799 1,570 1.99 74,504 94.23 
800-804 1,336 1.69 75,840 95.92 
805-809 765 0.97 76,605 96.88 
810-814 671 0.85 77,276 97.73 
815-819 560 0.71 77,836 98.44 
820-824 428 0.54 78,264 98.98 
825-829 159 0.20 78,423 99.18 
830-834 147 0.19 78,570 99.37 
835-839 229 0.29 78,799 99.66 
840-844 0 0 78,799 99.66 
845-850 271 0.34 79,070 100 
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Table A.12.37 Scale Score Cumulative Frequencies: Mathematics Grade 4 


Score Band Count Percent Curiblallve CUIlatve 
Count Percent 

650-654 568 0.70 568 0.70 
655-659 248 0.31 816 1.01 
660-664 344 0.43 1,160 1.44 
665-669 439 0.54 1,599 1.98 
670-674 1,059 1.31 2,658 3.30 
675-679 1,221 1.51 3,879 4.81 
680-684 1,388 1.72 5,267 6.54 
685-689 1,505 1.87 6,772 8.40 
690-694 2,419 3.00 9,191 11.40 
695-699 2,347 2.91 11,538 14.32 
700-704 2,420 3.00 13,958 17.32 
705-709 3,250 4.03 17,208 21.35 
710-714 3,125 3.88 20,333 25.23 
715-719 3,204 3.98 23,537 29.20 
720-724 3,934 4.88 27,471 34.09 
725-729 3,875 4.81 31,346 38.89 
730-734 3,936 4.88 35,282 43.78 
735-739 3,884 4.82 39,166 48.60 
740-744 4,568 5.67 43,734 54.26 
745-749 4,420 5.48 48,154 59.75 
750-754 3,564 4.42 51,718 64.17 
755-759 4,208 5.22 55,926 69.39 
760-764 4,191 5.20 60,117 74.59 
765-769 3,314 4.11 63,431 78.70 
770-774 3,228 4.01 66,659 82.71 
775-779 2,533 3.14 69,192 85.85 
780-784 3,571 4.43 72,763 90.28 
785-789 1,621 2.01 74,384 92.29 
790-794 1,487 1.85 75,871 94.14 
795-799 1,277 1.58 77,148 95.72 
800-804 1,228 1.52 78,376 97.25 
805-809 665 0.83 79,041 98.07 
810-814 509 0.63 79,550 98.70 
815-819 202 0.25 79,752 98.95 
820-824 233 0.29 79,985 99.24 
825-829 123 0.15 80,108 99.40 
830-834 172 0.21 80,280 99.61 
835-839 77 0.10 80,357 99.70 
840-844 109 0.14 80,466 99.84 
845-850 129 0.16 80,595 100 
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Table A.12.38 Scale Score Cumulative Frequencies: Mathematics Grade 5 


Score Band Count Percent Cumiilalive CuIUlalve 
Count Percent 

650-654 179 0.22 179 0.22 
655-659 99 0.12 278 0.34 
660-664 256 0.31 534 0.66 
665-669 212 0.26 746 0.92 
670-674 514 0.63 1,260 1.55 
675-679 1,120 1.37 2,380 2.92 
680-684 602 0.74 2,982 3.66 
685-689 1,755 2.15 4,737 5.81 
690-694 3,232 3.97 7,969 9.78 
695-699 2,399 2.94 10,368 12.72 
700-704 2,464 3.02 12,832 15.75 
705-709 4,768 5.85 17,600 21.60 
710-714 4,434 5.44 22,034 27.04 
715-719 4,035 4.95 26,069 31.99 
720-724 4,613 5.66 30,682 37.65 
725-729 3,539 4.34 34,221 41.99 
730-734 4,054 4.97 38,275 46.97 
735-739 4,859 5.96 43,134 52.93 
740-744 4,406 5.41 47,540 58.34 
745-749 3,610 4.43 51,150 62.77 
750-754 3,369 4.13 54,519 66.90 
755-759 4,020 4.93 58,539 71.84 
760-764 3,749 4.60 62,288 76.44 
765-769 2,479 3.04 64,767 79.48 
770-774 3,555 4.36 68,322 83.84 
775-779 2,301 2.82 70,623 86.67 
780-784 2,211 2.71 72,834 89.38 
785-789 1,963 2.41 74,797 91.79 
790-794 1,767 2.17 76,564 93.96 
795-799 1,592 1.95 78,156 95.91 
800-804 665 0.82 78,821 96.73 
805-809 951 1.17 79,772 97.89 
810-814 446 0.55 80,218 98.44 
815-819 416 0.51 80,634 98.95 
820-824 311 0.38 80,945 99.33 
825-829 114 0.14 81,059 99.47 
830-834 193 0.24 81,252 99.71 
835-839 83 0.10 81,335 99.81 
840-844 42 0.05 81,377 99.86 
845-850 112 0.14 81,489 100 
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Table A.12.39 Scale Score Cumulative Frequencies: Mathematics Grade 6 


Score Band Count Percent Cuiilalve CumUlatve 
Count Percent 

650-654 815 1.04 815 1.04 
655-659 7 0.01 822 1.04 
660-664 583 0.74 1,405 1.79 
665-669 774 0.98 2,179 2.77 
670-674 0 0 2,179 2.77 
675-679 2,200 2.8 4,379 5.57 
680-684 1,369 1.74 5,748 7.31 
685-689 1,442 1.83 7,190 9.14 
690-694 3,032 3.85 10,222 12.99 
695-699 3,311 4.21 13,533 17.20 
700-704 3,286 4.18 16,819 21.38 
705-709 3,153 4.01 19,972 25.39 
710-714 4,645 5.90 24,617 31.29 
715-719 4,227 5.37 28,844 36.67 
720-724 5,080 6.46 33,924 43.12 
725-729 4,481 5.70 38,405 48.82 
730-734 3,984 5.06 42,389 53.89 
735-739 4,317 5.49 46,706 59.37 
740-744 4,434 5.64 51,140 65.01 
745-749 3,967 5.04 55,107 70.05 
750-754 3,895 4.95 59,002 75.00 
755-759 2,982 3.79 61,984 78.79 
760-764 3,169 4.03 65,153 82.82 
765-769 3,258 4.14 68,411 86.96 
770-774 2,115 2.69 70,526 89.65 
775-779 2,233 2.84 72,759 92.49 
780-784 1,650 2.10 74,409 94.59 
785-789 1,321 1.68 75,730 96.27 
790-794 987 1.25 76,717 97.52 
795-799 708 0.90 77,425 98.42 
800-804 296 0.38 77,721 98.80 
805-809 242 0.31 77,963 99.11 
810-814 284 0.36 78,247 99.47 
815-819 91 0.12 78,338 99.58 
820-824 129 0.16 78,467 99.75 
825-829 46 0.06 78,513 99.81 
830-834 33 0.04 78,546 99.85 
835-839 32 0.04 78,578 99.89 
840-844 29 0.04 78,607 99.93 
845-850 58 0.07 78,665 100 
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Table A.12.40 Scale Score Cumulative Frequencies: Mathematics Grade 7 


Score Band Count Percent Cumiilalve CuIUlauve 
Count Percent 

650-654 149 0.24 149 0.24 
655-659 135 0.21 284 0.45 
660-664 5 0.01 289 0.46 
665-669 299 0.48 588 0.94 
670-674 308 0.49 896 1.43 
675-679 645 1.03 1,541 2.45 
680-684 652 1.04 2,193 3.49 
685-689 1,060 1.69 3,253 5.18 
690-694 1,021 1.62 4,274 6.80 
695-699 2,704 4.30 6,978 11.10 
700-704 2,919 4.64 9,897 15.75 
705-709 4,530 7.21 14,427 22.96 
710-714 4,339 6.90 18,766 29.86 
715-719 3,693 5.88 22,459 35.74 
720-724 4,493 7.15 26,952 42.89 
725-729 3,937 6.26 30,889 49.15 
730-734 5,080 8.08 35,969 57.23 
735-739 3,590 5.71 39,559 62.95 
740-744 3,753 5.97 43,312 68.92 
745-749 2,808 4.47 46,120 73.39 
750-754 3,350 5.33 49,470 78.72 
755-759 2,499 3.98 51,969 82.69 
760-764 2,336 3.72 54,305 86.41 
765-769 2,059 3.28 56,364 89.69 
770-774 1,703 2.71 58,067 92.40 
775-779 1,375 2.19 59,442 94.59 
780-784 1,131 1.80 60,573 96.38 
785-789 577 0.92 61,150 97.30 
790-794 495 0.79 61,645 98.09 
795-799 332 0.53 61,977 98.62 
800-804 260 0.41 62,237 99.03 
805-809 200 0.32 62,437 99.35 
810-814 100 0.16 62,537 99.51 
815-819 99 0.16 62,636 99.67 
820-824 65 0.10 62,701 99.77 
825-829 33 0.05 62,734 99.82 
830-834 13 0.02 62,747 99.84 
835-839 22 0.04 62,769 99.88 
840-844 33 0.05 62,802 99.93 
845-850 43 0.07 62,845 100 
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Table A.12.41 Scale Score Cumulative Frequencies: Mathematics Grade 8 


Score Band Count Percent Cumillalve CUMMISUVE 
Count Percent 

650-654 1,301 3.27 1,301 3.27 
655-659 777 1.95 2,078 5.22 
660-664 743 1.87 2,821 7.09 
665-669 1,250 3.14 4,071 10.23 
670-674 1,101 2.77 5,172 12.99 
675-679 1,524 3.83 6,696 16.82 
680-684 1,280 3.22 7,976 20.04 
685-689 3,116 7.83 11,092 27.86 
690-694 2,918 7.33 14,010 35.19 
695-699 2,805 7.05 16,815 42.24 
700-704 2,446 6.14 19,261 48.39 
705-709 2,180 5.48 21,441 53.86 
710-714 1,946 4.89 23,387 58.75 
715-719 1,784 4.48 25,171 63.23 
720-724 1,474 3.70 26,645 66.94 
725-729 2,592 6.51 29,237 73.45 
730-734 1,120 2.81 30,357 76.26 
735-739 1,400 3.52 31,757 79.78 
740-744 1,594 4.00 33,351 83.78 
745-749 1,271 3.19 34,622 86.97 
750-754 1,077 2.71 35,699 89.68 
755-759 886 2.23 36,585 91.91 
760-764 743 1.87 37,328 93.77 
765-769 602 1.51 37,930 95.28 
770-774 542 1.36 38,472 96.65 
775-779 436 1.10 38,908 97.74 
780-784 283 0.71 39,191 98.45 
785-789 208 0.52 39,399 98.98 
790-794 167 0.42 39,566 99.39 
795-799 74 0.19 39,640 99.58 
800-804 59 0.15 39,699 99.73 
805-809 39 0.10 39,738 99.83 
810-814 17 0.04 39,755 99.87 
815-819 15 0.04 39,770 99.91 
820-824 8 0.02 39,778 99.93 
825-829 5 0.01 39,783 99.94 
830-834 5 0.01 39,788 99.95 
835-839 4 0.01 39,792 99.96 
840-844 5 0.01 39,797 99.97 
845-850 10 0.03 39,807 100.00 
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Table A.12.42 Scale Score Cumulative Frequencies: Algebra | 


Score Band Count Percent Cumiblallve CUIIUIALNS 
Count Percent 

650-654 409 0.48 409 0.48 
655-659 3 0 412 0.49 
660-664 8 0.01 420 0.50 
665-669 723 0.85 1,143 1.35 
670-674 14 0.02 1,157 1.37 
675-679 1,558 1.84 2,715 3.20 
680-684 1,117 1.32 3,832 4.52 
685-689 1,667 1.97 5,499 6.49 
690-694 3,901 4.60 9,400 11.09 
695-699 2,275 2.68 11,675 13.78 
700-704 5,101 6.02 16,776 19.79 
705-709 2,891 3.41 19,667 23.21 
710-714 7,823 9.23 27,490 32.44 
715-719 4,927 5.81 32,417 38.25 
720-724 4,203 4.96 36,620 43.21 
725-729 5,537 6.53 42,157 49.74 
730-734 4,481 5.29 46,638 55.03 
735-739 3,845 4.54 50,483 59.57 
740-744 3,178 3.75 53,661 63.32 
745-749 3,642 4.30 57,303 67.61 
750-754 3,146 3.71 60,449 71.33 
755-759 3,358 3.96 63,807 75.29 
760-764 2,944 3.47 66,751 78.76 
765-769 2,514 2.97 69,265 81.73 
770-774 2,624 3.10 71,889 84.83 
775-779 2,686 3.17 74,575 88.00 
780-784 2,025 2.39 76,600 90.38 
785-789 1,779 2.10 78,379 92.48 
790-794 1,635 1.93 80,014 94.41 
795-799 1,123 1.33 81,137 95.74 
800-804 942 1.11 82,079 96.85 
805-809 740 0.87 82,819 97.72 
810-814 482 0.57 83,301 98.29 
815-819 321 0.38 83,622 98.67 
820-824 361 0.43 83,983 99.10 
825-829 210 0.25 84,193 99.34 
830-834 166 0.20 84 359 99.54 
835-839 103 0.12 84,462 99.66 
840-844 87 0.10 84,549 99.76 
845-850 200 0.24 84,749 100 
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Table A.12.43 Scale Score Cumulative Frequencies: Geometry 


Score Band Count Percent Cuminlallve SuIUlauve 
Count Percent 

650-654 9 0.07 9 0.07 
655-659 44 0.33 53 0.40 
660-664 1 0.01 54 0.40 
665-669 2 0.01 56 0.42 
670-674 122 0.91 178 1.33 
675-679 0 0 178 1.33 
680-684 95 0.71 273 2.04 
685-689 130 0.97 403 3.01 
690-694 331 2.47 734 5.48 
695-699 176 1.31 910 6.80 
700-704 459 3.43 1,369 10.22 
705-709 456 3.41 1,825 13.63 
710-714 614 4.59 2,439 18.22 
715-719 388 2.90 2,827 21.11 
720-724 713 5.32 3,540 26.44 
725-729 679 5.07 4,219 31.51 
730-734 555 4.14 4,774 35.65 
735-739 612 4.57 5,386 40.22 
740-744 697 5.21 6,083 45.43 
745-749 778 5.81 6,861 51.24 
750-754 719 5.37 7,580 56.61 
755-759 759 5.67 8,339 62.28 
760-764 867 6.47 9,206 68.75 
765-769 734 5.48 9,940 74.23 
770-774 799 5.97 10,739 80.20 
775-779 659 4.92 11,398 85.12 
780-784 589 4.40 11,987 89.52 
785-789 419 3.13 12,406 92.65 
790-794 265 1.98 12,671 94.63 
795-799 239 1.78 12,910 96.42 
800-804 201 1.50 13,111 97.92 
805-809 103 0.77 13,214 98.69 
810-814 77 0.58 13,291 99.26 
815-819 31 0.23 13,322 99.49 
820-824 31 0.23 13,353 99.72 
825-829 15 0.11 13,368 99.84 
830-834 11 0.08 13,379 99.92 
835-839 2 0.01 13,381 99.93 
840-844 3 0.02 13,384 99.96 
845-850 6 0.04 13,390 100 


New Meridian February 28, 2020 Page 274 


2019 Technical Report 


Table A.12.44 Scale Score Cumulative Frequencies: Algebra II 


Score Band Count Percent Cumiblallve Cumulative 
Count Percent 

650-654 23 0.45 23 0.45 
655-659 27 0.53 50 0.97 
660-664 0 0 50 0.97 
665-669 71 1.38 121 2.36 
670-674 38 0.74 159 3.10 
675-679 49 0.96 208 4.06 
680-684 128 2.50 336 6.55 
685-689 48 0.94 384 7.49 
690-694 128 2.50 512 9.98 
695-699 152 2.96 664 12.95 
700-704 156 3.04 820 15.99 
705-709 164 3.20 984 19.19 
710-714 146 2.85 1,130 22.03 
715-719 228 4.45 1,358 26.48 
720-724 200 3.90 1,558 30.38 
725-729 208 4.06 1,766 34.43 
730-734 140 2.73 1,906 37.16 
735-739 217 4.23 2,123 41.39 
740-744 207 4.04 2,330 45.43 
745-749 196 3.82 2,526 49.25 
750-754 211 4.11 2,737 53.36 
755-759 197 3.84 2,934 57.20 
760-764 224 4.37 3,158 61.57 
765-769 225 4.39 3,383 65.96 
770-774 244 4.76 3,627 70.72 
775-779 225 4.39 3,852 75.10 
780-784 228 4.45 4,080 79.55 
785-789 194 3.78 4,274 83.33 
790-794 161 3.14 4,435 86.47 
795-799 130 2.53 4,565 89.00 
800-804 119 2.32 4,684 91.32 
805-809 89 1.74 4,773 93.06 
810-814 73 1.42 4,846 94.48 
815-819 66 1.29 4,912 95.77 
820-824 47 0.92 4,959 96.69 
825-829 39 0.76 4,998 97.45 
830-834 34 0.66 5,032 98.11 
835-839 26 0.51 5,058 98.62 
840-844 17 0.33 5,075 98.95 
845-850 54 1.05 5,129 100 
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Table A.12.45 Scale Score Cumulative Frequencies: Integrated Mathematics I 


Score Band Count Percent Cumblaive CORMIBINE 
Count Percent 

650-654 0 0 0 0 
655-659 0 0 0 0 
660-664 1 0.69 1 0.69 
665-669 0 0 1 0.69 
670-674 0 0 1 0.69 
675-679 2 1.39 3 2.08 
680-684 2 1.39 5 3.47 
685-689 1 0.69 6 4.17 
690-694 11 7.64 17 11.81 
695-699 4 2.78 21 14.58 
700-704 5 3.47 26 18.06 
705-709 2 1.39 28 19.44 
710-714 19 13.19 47 32.64 
715-719 6 4.17 53 36.81 
720-724 9 6.25 62 43.06 
725-729 9 6.25 71 49.31 
730-734 14 9.72 85 59.03 
735-739 9 6.25 94 65.28 
740-744 11 7.64 105 72.92 
745-749 7 4.86 112 77.78 
750-754 8 5.56 120 83.33 
755-759 7 4.86 127 88.19 
760-764 3 2.08 130 90.28 
765-769 5 3.47 135 93.75 
770-774 0 0 135 93.75 
775-779 5 3.47 140 97.22 
780-784 0 0 140 97.22 
785-789 1 0.69 141 97.92 
790-794 1 0.69 142 98.61 
795-799 0 0 142 98.61 
800-804 0 0 142 98.61 
805-809 1 0.69 143 99.31 
810-814 0 0 143 99.31 
815-819 1 0.69 144 100 
820-824 0 0 144 100 
825-829 0 0 144 100 
830-834 0 0 144 100 
835-839 0 0 144 100 
840-844 0 0 144 100 
845-850 0 0 144 100 
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Table A.12.46 Scale Score Cumulative Frequencies: Integrated Mathematics II 


Score Band Count Percent Cumblaive CHIUlatve 
Count Percent 

650-654 0 0 0 0 
655-659 0 0 0 0 
660-664 0 0 0 0 
665-669 0 0 0 0 
670-674 0 0 0 0 
675-679 3 1.58 3 1.58 
680-684 0 0 3 1.58 
685-689 3 1.58 6 3.16 
690-694 10 5.26 16 8.42 
695-699 9 4.74 25 13.16 
700-704 7 3.68 32 16.84 
705-709 8 4.21 40 21.05 
710-714 4 2.11 44 23.16 
715-719 4 2.11 48 25.26 
720-724 15 7.89 63 33.16 
725-729 12 6.32 75 39.47 
730-734 12 6.32 87 45.79 
735-739 6 3.16 93 48.95 
740-744 11 5.79 104 54.74 
745-749 13 6.84 117 61.58 
750-754 11 5.79 128 67.37 
755-759 9 4.74 137 72.11 
760-764 7 3.68 144 75.79 
765-769 14 7.37 158 83.16 
770-774 2 1.05 160 84.21 
775-779 8 4.21 168 88.42 
780-784 6 3.16 174 91.58 
785-789 3 1.58 177 93.16 
790-794 2 1.05 179 94.21 
795-799 4 2.11 183 96.32 
800-804 0 0 183 96.32 
805-809 4 2.11 187 98.42 
810-814 0 0 187 98.42 
815-819 3 1.58 190 100 
820-824 0 0 190 100 
825-829 0 0 190 100 
830-834 0 0 190 100 
835-839 0 0 190 100 
840-844 0 0 190 100 
845-850 0 0 190 100 
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Table A.12.47 Subgroup Performance for ELA/L Scale Scores: Grade 3 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Full Summative Score 72,442 737.23 42.45 650 850 
Ganice Female 35,570 742.86 42.63 650 850 
Male 36,872 731.80 41.57 650 850 
pn e eo mlanieeehe 196 733.54 38.23 650 839 
Native 
Asian 4,357 764.22 40.31 650 850 
Black/African American 26,106 723.36 39.21 650 850 
Ethnicity Hispanic/Latino 13,332 723.34 39.13 650 850 
Native Hawaiian/Pacific 95 736.84 42.64 651 832 
Islander 
Two or more races 3,420 745.92 41.31 650 850 
White 24,754 753.26 39.64 650 850 
Smeeeb wana 37,541 752.70 40.29 650 850 
. Disadvantaged 
Economic Status* Economically 
. 34,883 720.59 38.21 650 850 
Disadvantaged 
Ennlish Leaner Stats Non English Learner 56,834 742.34 42.07 650 850 
English Learner 10,128 711.65 34.41 650 850 
saunas Students without Disabilities 61,539 742.60 40.79 650 850 
eee Students with Disabilities 10,903 706.89 3876 650 850 
Beading euinimeive 72,442 45.90 17.37 10 90 
Score 
Cente Female 35,570 47.40 17.15 10 90 
Male 36,872 44.45 17.45 10 90 
American Indian/Alaska 196 44.89 16.03 10 88 
Native 
Asian 4,357 56.19 16.51 10 90 
Black/African American 26,106 40.10 15.70 10 90 
Ethnicity Hispanic/Latino 13,332 39.75 15.77 10 90 
Native Hawaiian/Pacific 95 45.65 416.60 42 79 
Islander 
Two or more races 3,420 49.59 16.79 10 90 
White 24,754 52.94 16.43 10 90 
Nore canonically 37,541 52.27 16.64 10 90 
. Disadvantaged 
Economic Status* Economically 
‘ 34,883 39.05 15.42 10 90 
Disadvantaged 
Endlich samar status Non English Learner 56,834 48.07 17.21 10 90 
English Learner 10,128 34.90 13.59 10 90 
Disabilities Students without Disabilities 61,539 47.95 16.70 10 90 
Students with Disabilities 10,903 34.29 16.44 10 90 
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Group Type Group N Mean SD Min Max 
Watng Summative 72,442 29.24 12.83 10 60 
Score 
Bonase Female 35,570 31.45 12.52 10 60 
mee Male 36,872 27.10 12.75 10 60 
American Indian/Alaska 196 27.96 12.30 10 5A 
Native 
Asian 4,357 36.68 11.16 10 60 
Black/African American 26,106 25.69 12.57 10 60 
Ethnicity Hispanic/Latino 13,332 26.33 12.38 10 60 
Native Hawaiian/Pacific 95 27.89 14.16 10 60 
Islander 
Two or more races 3,420 31.20 12.54 10 60 
White 24,754 32.95 11.98 10 60 
Nore onarnically 37,541 33.25 11.90 10 60 
Disadvantaged 
Economic Status* Eeonanical 
es 34,883 24.93 12.37 10 60 
Disadvantaged 
Aish samc atads Non English Learner 56,834 30.42 12.70 10 60 
g English Learner 10,128 23.46 11.75 10 60 
Disabiliti Students without Disabilities 61,539 30.79 12.36 10 60 
ee Students with Disabilities 10,903 20.45 11.84 10 60 


Note: This table is identical to Table 12.7 in Section 12. *Economic status was based on participation 
in National School Lunch Program (NSLP): receipt of free or reduced-price lunch (FRL). 
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Table A.12.48 Subgroup Performance for ELA/L Scale Scores: Grade 4 
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Group Type Group N Mean SD Min Max 
Full Summative Score 74,167 741.91 38.12 650 850 
Gane Female 36,426 747.98 37.73 650 850 
wna Male 37,741 736.05 3757 650 850 
Pune CaIP Ne NAnUNASKS 167 739.34 3548 650 809 
Native 
Asian 4,491 766.87 35.27 650 850 
Black/African American 26,848 728.93 35.16 650 850 
Ethnicity Hispanic/Latino 13,959 731.48 35.14 650 850 
Native Hawaiian/Pacific 107 737.94 39.47 676 828 
Islander 
Two or more races 3,438 748.76 37.20 650 850 
White 25,007 756.11 35.91 650 850 
NOPE coneinicaly 38,215 755.52 3656 650 850 
Disadvantaged 
Economic Status* Eeonamicall 
aaaian A 35,938 727.45 3420 650 850 
Disadvantaged 
Endlich earner sine Non English Learner 59,354 745.97 37.77 650 850 
g English Learner 9,466 717.90 29.49 650 850 
Disabilities Students without Disabilities 62,096 747.54 36.03 650 850 
Students with Disabilities 12,070 712.92 35.31 650 850 
Reading sumnmeatve 74,167 47.14 15.36 10 90 
Score 
Garde Female 36,426 48.64 15.13 10 90 
ae Male 37,741 45.69 15.45 10 90 
American Indian/Alaska 167 46.24 14.23 4 83 
Native 
Asian 4,491 56.85 14.47 10 90 
Black/African American 26,848 41.87 13.81 10 90 
Ethnicity Hispanic/Latino 13,959 42.50 13.98 10 90 
Native Hawaiian/Pacific 107 44.81 16.20 17 90 
Islander 
Two or more races 3,438 50.19 15.15 10 90 
White 25,007 53.18 14.70 10 90 
pane coneinicelly 38,215 52.72 14.95 10 90 
4 Disadvantaged 
Economic Status Eponomical 
eel ana d 35,938 41.21 13.46 10 90 
Disadvantaged 
Eralicnibeamaneenis Non English Learner 59,354 48.91 15.27 10 90 
g English Learner 9.466 36.93 11.28 10 90 
Disabilti Students without Disabilities 62,096 49.18 14.66 10 90 
aaa Students with Disabilities 12,070 36.67 14.60 10 90 
Wine amine’ 74,167 31.74 10.93 10 60 
Score 
Gender Female 36,426 34.12 10.27 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 
Students without Disabilities 
Students with Disabilities 


N 
37,741 


167 


4,491 
26,848 
13,959 


107 


3,438 
25,007 


38,215 


35,938 


59,354 

9,466 
62,096 
12,070 


Mean 
29.44 


30.69 


37.93 
28.60 
29.79 


31.87 


33.20 
34.86 


34.94 


28.35 


32.58 
26.64 
33.46 
22.90 


SD 
11.05 


11.05 


9.08 
10.96 
10.59 


10.53 


10.40 
10.06 


10.07 


10.78 


10.77 
10.16 
10.02 
11.15 


Min 
10 
10 


10 
10 
10 


10 


10 
10 


10 


10 


10 
10 
10 
10 


Max 
60 


53 


60 
60 
60 


52 


60 
60 


60 


60 


60 
60 
60 
60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). 
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Table A.12.49 Subgroup Performance for ELA/L Scale Scores: Grade 5 
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Group Type Group N Mean SD Min Max 
Full Summative Score 75,421 741.86 35.86 650 850 
ear Female 36,916 747.84 35.62 650 850 
ails Male 38,505 736.13 35.15 650 850 
zNaeticam Clarueves hs 245 739.60 33.37 650 823 
Native 
Asian 4,673 766.47 32.74 650 850 
Black/African American 27,665 729.19 32.67 650 850 
Ethnicity Hispanic/Latino 13,731 730.56 33.52 650 850 
Native Hawaiian/Pacific 118 748.06 36.10 664 841 
Islander 
Two or more races 3,406 748.28 34.58 650 850 
White 25,438 756.23 33.10 650 850 
Mare conotnically 39,324 755.06 33.98 650 850 
: Disadvantaged 
Economic Status* Economical 
eA 36,089 727.50 3215 650 850 
Disadvantaged 
ES alistkseimeratenis Non English Learner 63,529 745.72 34.76 650 850 
g English Learner 6,744 708.62 26.08 650 847 
Disabilities Students without Disabilities 62,725 747.39 33.78 650 850 
Students with Disabilities 12,696 714.57 33.27 650 850 
peadiig sunmatve 75,421 47.21 14.55 10 90 
Score 
Gander Female 36,916 48.57 14.40 10 90 
: Male 38,505 45.90 14.57 10 90 
American Indian/Alaska 245 46.29 43.20 410 80 
Native 
Asian 4,673 57.03 13.60 10 90 
Black/African American 27,665 42.03 12.99 10 90 
Ethnicity Hispanic/Latino 13,731 42.23 13.42 10 90 
Native Hawaiian/Pacific 118 49.79 14.33 17 81 
Islander 
Two or more races 3,406 49.97 14.07 10 90 
White 25,438 53.30 13.55 10 90 
are cenotically 39,324 52.69 13.89 10 90 
4 Disadvantaged 
Economic Status ee ananical 
paieaaccaiied 36,089 41.23 12.78 10 90 
Disadvantaged 
Spans desman Satie Non English Learner 63,529 48.87 14.14 10 90 
g English Learner 6,744 33.47 9.99 10 +83 
Disabiliti Students without Disabilities 62,725 49.21 13.85 10 90 
ee Students with Disabilities 12,696 37.29 13.83 10 90 
Tn ocmninatve 75,421 30.98 11.36 10 60 
Score 
Gender Female 36,916 33.78 10.31 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


New Meridian 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


4,673 
27,665 
13,731 


118 


3,406 
25,438 


39,324 


36,089 


63,529 

6,744 
62,725 
12,696 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 


February 28, 2020 


Min Max 
10 60 
10 51 
10 60 
10 57 
10 60 
10 57 
10 60 
10 60 
10 60 
10 57 
10 60 
10 57 
10 60 
10 57 
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Table A.12.50 Subgroup Performance for ELA/L Scale Scores: Grade 6 
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Group Type Group N Mean SD Min Max 
Full Summative Score 78,755 740.79 32.99 650 850 
Gand Female 38,507 747.65 32.40 650 850 
eet Male 40,248 734.23 32.21 650 850 
Ee Teelt Ne lAnENash a 217 740.33 3213 657 823 
Native 
Asian 4,852 763.61 31.31 650 850 
Black/African American 27,199 728.86 30.62 650 850 
Ethnicity Hispanic/Latino 14,650 731.92 30.22 650 850 
Native Hawaiian/Pacific 227 (746.11 27.98 670 819 
Islander 
Two or more races 3,836 746.07 31.67 650 850 
White 27,559 752.34 30.83 650 850 
DERE canemicaly 44,443 751.63 31.21 650 850 
; Disadvantaged 
Economic Status* Eeancmicall 
ened 34,302 726.75 29.79 650 850 
Disadvantaged 
Ensiichiessinar eine Non English Learner 68,782 743.34 32.13 650 850 
g English Learner 4,997 707.44 23.86 650 828 
Disabilities Students without Disabilities 65,794 745.78 31.12 650 850 
Students with Disabilities 12,961 715.46 30.47 650 850 
Reading summative 78,755 46.91 13.39 10 90 
Score 
Gander Female 38,507 48.75 13.20 10 90 
. Male 40,248 45.15 13.34 10 90 
American Indian/Alaska 47 46.96 13.10 44 84 
Native 
Asian 4,852 55.87 12.85 10 90 
Black/African American 27,199 41.96 12.11 10 90 
Ethnicity Hispanic/Latino 14,650 43.00 12.05 10 90 
Native Hawaiian/Pacific 297 48.81 11.59 20 79 
Islander 
Two or more races 3,836 49.54 13.04 10 90 
White 27,559 51.85 12.72 10 90 
eb anemealy 44,443 5143 12.83 10 90 
Disadvantaged 
Economic Status* eaenomicall 
neaerueestian 34,302 41.05 11.72 10 90 
Disadvantaged 
aalishitearnar Steals Non English Learner 68,782 48.03 13.08 10 90 
: English Learner 4,997 33.34 9.12 10 84 
Disabiliti Students without Disabilities 65,794 48.75 12.77 10 90 
ean Students with Disabilities 12,961 37.58 12.55 10 90 
We Atng Samninalve 78,755 30.17 11.42 10 60 
Score 
Gender Female 38,507 33.28 10.18 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


3,836 
27,559 


44,443 


34,302 


68,782 

4,997 
65,794 
12,961 


Mean 
27.20 


29.83 


36.64 
26.84 
28.15 


32.70 


31.23 
33.19 


33.16 


26.31 


30.82 
20.46 
31.95 
21.13 


SD 
11.75 


11.39 


9.30 
11.63 
11.30 


9.11 


10.96 
10.33 


10.34 


11.59 


11.14 
10.83 
10.50 
11.60 


Min Max 
10 60 
10 52 
10 60 
10 60 
10 60 
10 53 
10 60 
10 60 
10 60 
10 60 
10 60 
10 52 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). 
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Table A.12.51 Subgroup Performance for ELA/L Scale Scores: Grade 7 
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Group Type Group N Mean SD Min Max 
Full Summative Score 75,084 745.41 39.49 650 850 
Gand Female 36,655 753.93 38.67 650 850 
eet Male 38,429 737.29 3855 650 850 
Ee Teelt Ne lAnENash a 206 740.40 39.16 650 832 
Native 
Asian 4,786 770.54 36.64 650 850 
Black/African American 26,006 732.07 37.18 650 850 
Ethnicity Hispanic/Latino 13,471 733.66 37.19 650 850 
Native Hawaiian/Pacific 205 753.44 35.63 650 850 
Islander 
Two or more races 3,555 752.06 37.37 650 850 
White 26,669 758.80 36.62 650 850 
DERE canemicaly 43,369 757.92 37.05 650 850 
; Disadvantaged 
Economic Status* Econaniicall 
ened 31,705 728.31 3618 650 850 
Disadvantaged 
Easichieamerceiue Non English Learner 66,270 748.23 38.30 650 850 
g English Learner 4,293 702.74 29.24 650 831 
Disabilities Students without Disabilities 62,658 751.42 37.24 650 850 
Students with Disabilities 12,425 715.12 36.50 650 850 
Reading summative 75,084 48.59 16.09 10 90 
Score 
Gander Female 36,655 51.04 15.84 10 90 
. Male 38,429 46.25 15.97 10 90 
American Indian/Alaska 206 46.71 16.01 10 86 
Native 
Asian 4,786 58.24 15.17 10 90 
Black/African American 26,006 43.12 14.88 10 90 
Ethnicity Hispanic/Latino 13,471 43.36 15.03 10 90 
Native Hawaiian/Pacific 205 5187 14.86 40 89 
Islander 
Two or more races 3,555 51.92 15.41 10 90 
White 26,669 54.32 15.02 10 90 
eb anemealy 43,369 53.83 15.22 10 90 
Disadvantaged 
Economic Status* eaenomicall 
nani 31,705 4142 14.38 10 90 
Disadvantaged 
aalishitearnar Steals Non English Learner 66,270 49.88 15.63 10 90 
: English Learner 4,293 30.75 11.01 10 75 
Disabiliti Students without Disabilities 62,658 50.81 15.31 10 90 
ean Students with Disabilities 12,425 37.36 15.18 10 90 
We Atng Samninalve 75,084 32.20 11.67 10 60 
Score 
Gender Female 36,655 35.27 10.59 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


4,786 
26,006 
13,471 


205 


3,555 
26,669 


43,369 


31,705 


66,270 

4,293 
62,658 
12,425 


Mean 
29.27 


30.59 


38.84 
28.83 
29.85 


34.47 


33.37 
35.29 


35.25 


28.03 


32.78 
22.13 
34.05 
22.87 


SD 
11.90 


12.16 


9.49 
11.82 
11.46 


10.26 


11.12 
10.67 


10.59 


11.78 


11.40 
10.94 
10.68 
11.99 


Min Max 
10 60 
10 55 
10 60 
10 60 
10 60 
10 57 
10 60 
10 60 
10 60 
10 60 
10 60 
10 57 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). 
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Table A.12.52 Subgroup Performance for ELA/L Scale Scores: Grade 8 
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Group Type Group N Mean SD Min Max 
Full Summative Score 72,619 743.23 40.62 650 850 
ear Female 35,820 752.27 39.92 650 850 
eee Male 36,799 734.44 3935 650 850 
zNaeticam Clarueves hs 179 736.44 3912 650 849 
Native 
Asian 4,819 770.93 37.03 650 850 
Black/African American 25,034 727.68 36.77 650 850 
Ethnicity Hispanic/Latino 12,582 730.78 37.79 650 850 
Native Hawaiian/Pacific 205 754.19 33.03 668 850 
Islander 
Two or more races 3,263 750.60 39.03 650 850 
White 26,380 757.80 38.01 650 850 
Mare conotnically 43,249 755.82 38.66 650 850 
: Disadvantaged 
Economic Status* Economical 
eA 29,361 724.70 36.09 650 850 
Disadvantaged 
ES alistkseimeratenis Non English Learner 64,309 746.07 39.63 650 850 
g English Learner 4,051 700.68 28.33 650 850 
Disabilities Students without Disabilities 60,790 748.96 38.72 650 850 
Students with Disabilities 11,829 713.79 37.30 650 850 
Reading: Uminave 72,619 48.09 16.47 10 90 
Score ee 
Gander Female 35,820 50.78 16.34 10 90 
: Male 36,799 45.47 16.17 10 90 
American Indian/Alaska 179 45.42 15.21 44 37 
Native 
Asian 4,819 58.44 15.36 10 90 
Black/African American 25,034 41.79 14.62 10 90 
Ethnicity Hispanic/Latino 12,582 42.73 15.06 10 90 
Native Hawaiian/Pacific 205 50.94 13.31 18 82 
Islander 
Two or more races 3,263 51.49 16.01 10 90 
White 26,380 54.26 15.67 10 90 
DpUEconomicaly 43,249 53.25 15.87 10 90 
4 Disadvantaged 
Economic Status E icall 
paieaaccaiied 29,361 40.49 14.24 10 90 
Disadvantaged 
Spans desman Satie Non English Learner 64,309 49.34 16.09 10 90 
g English Learner 4,051 30.67 10.67 10 87 
Disabiliti Students without Disabilities 60,790 50.21 15.82 10 90 
ee Students with Disabilities 11,829 37.18 15.40 10 90 
Tn ocmninatve 72,619 31.14 12.25 10 60 
Score 
Gender Female 35,820 34.42 11.17 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 
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Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


3,263 
26,380 


43,249 


29,361 


64,309 

4,051 
60,790 
11,829 


Min Max 
10 60 
10 59 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.53 Subgroup Performance for ELA/L Scale Scores: Grade 9 
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Group Type Group N Mean SD Min Max 
Full Summative Score 3,388 732.86 40.47 650 850 
Female 1,735 739.65 39.55 650 850 
Gender 
Male 1,653 725.73 40.20 650 850 
American Indian/Alaska 
‘ n/r n/r n/r n/r n/r 
Native 
Asian 48 779.06 31.66 701 840 
Black/African American 2,367 725.61 35.12 650 827 
Ethnicity Hispanic/Latino 579 725.67 40.03 650 843 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or more races n/r n/r n/r n/r n/r 
White 330 784.25 34.27 650 850 
ppb eanopcaly 793 769.07 37.22 650 850 
Disadvantaged 
Economic Status* Econernical 
ened 2,594 721.80 34.52 650 840 
Disadvantaged 
. Non English Learner n/r n/r n/r n/r n/r 
Bengal econ Sta? engin beamet 284 705.59 3449 650 830 
Disabilities Students without Disabilities 2,635 739.60 39.35 650 850 
Students with Disabilities 753 709.27 35.16 650 841 
Reselng summative 3,388 43.65 16.67 10 90 
Score 
Gander Female 1,735 45.68 16.50 10 90 
. Male 1,653 41.52 16.60 10 90 
American Indian/Alaska 
: n/r n/r n/r n/r n/r 
Native 
Asian 48 62.06 12.66 32 90 
Black/African American 2,367 40.71 14.54 10 84 
Ethnicity Hispanic/Latino 579 = 40.31 16.13 10 90 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or more races n/r n/r n/r n/r n/r 
White 330 65.22 13.73 10 90 
eb anemealy 793 58.74 15.29 10 90 
> Disadvantaged 
Economic Status Peonoricall 
neaerueestian 2,594 39.04 14.17 10 89 
Disadvantaged 
. Non English Learner n/r n/r n/r n/r n/r 
sa ei cataald aa og 17 284 32.22 13.42 10 79 
Disabiliti Students without Disabilities 2,635 46.25 16.29 10 90 
eer Students with Disabilities 753 34.55 14.70 10 86 
We Atng Samninalve 3,388 28.15 12.51 10 60 
Score 
Gender Female 1,735 30.91 11.58 10 60 
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Group Type 


Ethnicity 


Economic Status” 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


284 
2,635 
753 


Mean 
25.26 


n/r 


40.25 
26.36 
26.51 


n/r 


n/r 
40.48 


37.14 


25.41 


n/r 
20.80 
30.30 
20.63 


SD 
12.79 


n/r 


7.97 
11.75 
12.67 


n/r 


n/r 
9.61 


10.60 


11.73 


n/r 
11.83 
11.86 
11.80 


Min Max 
10 54 
n/r n/r 
10 54 
10 54 
10 54 
n/r n/r 
n/r n/r 
10 60 
10 60 
10 54 
n/r n/r 
10 53 
10 60 
10 53 


Note: This table is identical to Table 12.8 in Section 12. *Economic status was based on participation in 
National School Lunch Program (NSLP): receipt of free or reduced-price lunch (FRL). n/r = not reported 


due to n<20. 


New Meridian 


February 28, 2020 


Page 291 


Table A.12.54 Subgroup Performance for ELA/L Scale Scores: Grade 10 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Full Summative Score 73,487 748.75 48.17 650 850 
Gand Female 35,774 756.75 47.44 650 850 
eet Male 37,713 741.17 47.63 650 850 
Ee Teelt Ne lAnENash a 173 748.45 45.61 654 850 
Native 
Asian 4,860 781.14 42.52 650 850 
Black/African American 26,732 730.73 43.44 650 850 
Ethnicity Hispanic/Latino 13,203 729.89 46.27 650 850 
Native Hawaiian/Pacific 148 751.28 41.97 650 850 
Islander 
Two or more races 3,149 762.46 43.87 650 850 
White 25,083 769.84 42.77 650 850 
ppb eanopcaly 45,160 762.24 46.15 650 850 
; Disadvantaged 
Economic Status* Eeancmicall 
ote 28,224 727.38 43.27 650 850 
Disadvantaged 
Ensiichiessinar eine Non English Learner 63,915 754.83 45.87 650 850 
g English Learner 5,856 693.89 31.64 650 850 
Disabilities Students without Disabilities 60,912 755.05 46.38 650 850 
Students with Disabilities 12,574 718.26 44.91 650 850 
Reading summative 73,487 50.17 19.63 10 90 
Score 
Gander Female 35,774 52.17 19.37 10 90 
. Male 37,713 48.26 19.70 10 90 
American Indian/Alaska 173 50.06 18.51 40 90 
Native 
Asian 4,860 62.89 18.12 10 90 
Black/African American 26,732 43.07 17.43 10 90 
Ethnicity Hispanic/Latino 13,203 42.04 18.62 10 90 
Native Hawaiian/Pacific 148 50.61 17.03 10 90 
Islander 
Two or more races 3,149 56.32 18.06 10 90 
White 25,083 58.73 17.75 10 90 
eb anemealy 45,160 55.74 18.99 10 90 
Disadvantaged 
Economic Status* eaenomicall 
nani 28,224 41.32 17.22 10 90 
Disadvantaged 
aalishitearnar Steals Non English Learner 63,915 52.66 18.74 10 90 
: English Learner 5,856 27.50 11.95 10 90 
Disabiliti Students without Disabilities 60,912 52.53 18.99 10 90 
ean Students with Disabilities 12,574 38.73 18.68 10 90 
We Atng Samninalve 73,487 33.03 12.76 10 60 
Score 
Gender Female 35,774 35.82 12.11 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


3,149 
25,083 


45,160 


28,224 


63,915 

5,856 
60,912 
12,574 


Mean 
30.38 


33.06 


40.83 
28.61 
29.14 


34.53 


35.81 
37.90 


36.10 


28.16 


34.41 
20.73 
34.67 
25.05 


SD 
12.80 


12.16 


10.37 
12.32 
12.53 


10.82 


11.76 
11.27 


12.06 


12.32 


12.24 
10.42 
12.19 
12.48 


Min Max 
10 60 
10 60 
10 60 
10 60 
10 60 
10 59 
10 60 
10 60 
10 60 
10 60 
10 60 
10 53 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). 
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Table A.12.55 Subgroup Performance for ELA/L Scale Scores: Grade 11 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Full Summative Score 60 710.90 30.29 658 814 
Gana Female 25 711.92 26.11 658 785 
eet Male 35. 710.17 33.31 667 814 
American Indian/Alaska 
‘ n/r n/r n/r n/r n/r 
Native 
Asian n/r n/r n/r n/r n/r 
Black/African American 32 709.88 26.92 658 768 
Ethnicity Hispanic/Latino 26 706.08 26.35 672 785 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or more races n/r n/r n/r n/r n/r 
White n/r n/r n/r n/r n/r 
per PcOnemniealy n/r n/r n/r nr n/r 
Disadvantaged 
Economic Status* Eeancmicall 
ened 43 70742 2658 658 785 
Disadvantaged 
. Non English Learner n/r n/r n/r n/r n/r 
eng al eanon sane English Learner n/r n/r n/r n/r n/r 
Disabilities Students without Disabilities 34 706.88 26.21 667 785 
Students with Disabilities 26 716.15 34.76 658 814 
Reselng summative 60 35.87 11.83 15 75 
Score 
Gander Female 25 34.72 9.83 15 58 
= Male 35 36.69 13.16 19 75 
American Indian/Alaska 
: n/r n/r n/r n/r n/r 
Native 
Asian n/r n/r n/r n/r n/r 
Black/African American 32 36.06 10.71 15 60 
Ethnicity Hispanic/Latino 26 833.12 9.57 21 58 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or more races n/r n/r n/r n/r n/r 
White n/r n/r n/r n/r n/r 
mel FE onemiiealy n/r n/r n/r n/r n/r 
. Disadvantaged 
Economic Status* Peonoricall 
neaerueestian 43 3447 9.97 15 58 
Disadvantaged 
. Non English Learner n/r n/r n/r n/r n/r 
tat 
SE reeme ane English Learner n/r n/r nr n/r nr 
en Students without Disabilities 34 34.29 9.70 19 58 
Disabilities : : vgs 
Students with Disabilities 26 37.92 14.09 15 75 
We Atng Samninalve 60 2047 11.54 10 48 
Score 
Gender Female 25 23.48 11.00 10 45 
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Group Type 


Ethnicity 


Economic Status” 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


26 


Mean 
18.31 


n/r 


n/r 
18.66 
21.08 


n/r 


n/r 
n/r 


n/r 


19.65 


n/r 
n/r 
18.79 
22.65 


SD 
11.59 


n/r 


n/r 
10.97 
11.04 


n/r 


n/r 
n/r 


n/r 


10.95 


n/r 
n/r 
11.12 
11.93 


Min Max 
10 48 
n/r n/r 
n/r n/r 
10 38 
10 45 
n/r n/r 
n/r n/r 
n/r n/r 
n/r n/r 
10 45 
n/r n/r 
n/r n/r 
10 45 
10 48 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Table A.12.56 Subgroup Performance for Mathematics Scale Scores: Grade 3 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
eusummealve 79,070 741.90 3802 650 850 
Score 
Female 38,807 741.95 37.13 650 850 
Gender 
Male 40,263 741.86 38.86 650 850 
Aineticaly ndleaniseeke 219 73593 35.23 650 822 
Native 
Asian 4,811 770.17 35.25 650 850 
Black/African American 26,772 727.66 35.46 650 850 
Ethnicity Hispanic/Latino 15,070 730.89 34.83 650 850 
pe ae pace 185 747.44 34.08 650 838 
Two or more races 4174 748.34 36.78 650 850 
White 27,554 755.69 34.80 650 850 
Not Economically 
Economic Status* Disadvantaged meen. oan ate o e20 
Economically Disadvantaged 35,169 726.18 34.73 650 850 
English Learner Non English Learner 62,145 745.33 37.96 650 850 
Status English Learner 11,464 723.89 33.09 650 850 
Disabilities Students without Disabilities 67,214 746.05 36.59 650 850 
Students with Disabilities 11,856 718.42 37.47 650 850 
Language Form Spanish 139 709.93 40.59 650 807 


Note: This table is identical to Table 12.9 in Section 12. *Economic status was based on participation 
in National School Lunch Program (NSLP): receipt of free or reduced-price lunch (FRL). 
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Table A.12.57 Subgroup Performance for Mathematics Scale Scores: Grade 4 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Eu sunminietive 80,595 739.72 35.70 650 850 
Score 
Gana Female 39,553 740.05 34.54 650 850 
ee Male 41,042 73939 36.78 650 850 
eeteale eae taeks 194 738.22 3050 663 807 
Native 
Asian 4,884 768.30 33.76 650 850 
Black/African American 27,521 724.71 32.44 650 850 
Ethnicity Hispanic/Latino 15,642 72969 32.61 650 850 
Native Hawaiian/Pacific 218 743.05 35.66 650 830 
Islander 
Two or more races 4,210 746.93 33.93 650 850 
White 27,685 753.98 31.97 650 850 
per conomicay 44,338 752.05 33.83 650 850 
. Disadvantaged 
Economic Status enone 
esobunend 36,243 724.64 31.93 650 850 
Disadvantaged 
English Learner Non English Learner 64,505 743.11 35.41 650 850 
Status English Learner 10,747 719.89 29.97 650 850 
Disabilities Students without Disabilities 67,628 744.13 34.27 650 850 
Students with Disabilities 12,966 716.68 34.13 650 850 
Language Form Spanish 147 701.18 26.00 650 779 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.58 Subgroup Performance for Mathematics Scale Scores: Grade 5 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
pul seimmalve 81,489 738.87 34.28 650 850 
Score 
ae Female 39,956 739.63 33.10 650 850 
lite Male 41,533 738.14 3535 650 850 
pine ticalp Ndlaunasks 279 735.43 31.01 672 830 
Native 
Asian 5,067 767.86 34.21 650 850 
Black/African American 28,340 724.64 29.15 650 850 
Ethnicity Hispanic/Latino 15,345 728.60 30.61 650 850 
Native Hawaiian/Pacific 223 743.53 30.20 650 826 
Islander 
Two or more races 4,073 744.65 33.38 650 850 
White 27,957 752.68 32.19 650 850 
not conomlpeny 45,154 750.89 33.44 650 850 
; Disadvantaged 
Economic Status* SeeeerenheT 
plead 36,327 723.94 29.02 650 850 
Disadvantaged 
English Learner Non English Learner 68,510 741.97 33.99 650 850 
Status English Learner 7,828 714.02 26.13 650 832 
Disabilities Students without Disabilities 67,996 742.85 33.75 650 850 
Students with Disabilities 13,493 718.81 29.54 650 850 
Language Form Spanish 152 701.18 29.15 650 790 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.59 Subgroup Performance for Mathematics Scale Scores: Grade 6 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
eur cuneate 78,665 731.84 3264 650 850 
Score 
Female 38,504 733.51 31.62 650 850 
ene Male 40,161 730.24 33.51 650 850 
ene maaan 215 731.16 32.02 650 815 
Native 
Asian 4,767 760.00 32.01 650 850 
Black/African American 27,095 717.57 28.22 650 850 
Ethnicity Hispanic/Latino 14,909 722.32 28.79 650 836 
Native Hawaiian/Pacific 226 736.49 24.18 668 786 
Islander 
Two or more races 3,812 738.16 31.86 650 850 
White 27,429 745.22 30.09 650 850 
we eticen! 44,259 743.34 31.36 650 850 
; Disadvantaged 
Economic Status* Economically 
; 34,396 717.05 27.95 650 850 
Disadvantaged 
English Learner Non English Learner 68,261 734.47 32.04 650 850 
Status English Learner 5,444 703.37 25.71 650 814 
Disabilities Students without Disabilities 65,793 735.72 31.77 650 850 
Students with Disabilities 12,872 712.03 29.68 650 850 
Language Form Spanish 112 698.04 22.08 650 761 
Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.60 Subgroup Performance for Mathematics Scale Scores: Grade 7 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
cur Summative 62,845 731.61 2853 650 850 
Score 
Female 30,782 733.04 27.82 650 850 
ee Male 32,063 730.25 29.14 650 850 
ene maaan 175 729.10 26.60 6850 831 
Native 
Asian 2,655 749.43 28.50 650 850 
Black/African American 24,238 721.43 24.26 650 842 
Ethnicity Hispanic/Latino 12,161 723.67 25.44 650 850 
Native Hawaiian/Pacific 98 741.41 28.14 665 840 
Islander 
Two or more races 2,480 736.10 27.73 650 850 
White 20,943 745.08 28.14 650 850 
we eticen! 32,175 741.57 28.61 650 850 
; Disadvantaged 
Economic Status* Economically 
: 30,660 721.17 24.42 650 850 
Disadvantaged 
English Learner Non English Learner 54,012 733.46 28.04 650 850 
Status English Learner 4,406 708.20 21.41 650 817 
Disabilities Students without Disabilities 51,514 735.18 27.91 650 850 
Students with Disabilities 11,330 715.40 25.57 650 850 
Language Form Spanish 139 703.23 20.71 650 757 
Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.61 Subgroup Performance for Mathematics Scale Scores: Grade 8 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
pu Surniniaive 39,807 710.70 32.65 650 850 
Score 
Female 18,899 713.40 32.19 650 842 
Gender 
Male 20,908 708.26 32.86 650 850 
eeteale eae taeks 98 706.96 33.42 650 801 
Native 
Asian 980 727.49 35.24 650 850 
Black/African American 18,054 703.10 29.51 650 837 
Ethnicity Hispanic/Latino 7,704 704.45 30.38 650 833 
Native Hawaiian/Pacific 74 725.96 33.77 655 796 
Islander 
Two or more races 1,384 715.20 32.48 650 850 
White 11,470 724.71 33.24 650 850 
PEE conomicay 17,757 719.89 33.69 650 850 
. Disadvantaged 
Economic Status* enone 
esobunend 22,040 703.29 29.78 650 837 
Disadvantaged 
English Learner Non English Learner 32,890 712.39 32.13 650 850 
Status English Learner 3,538 691.04 25.61 650 818 
Disabilities Students without Disabilities 30,967 714.73 32.32 650 850 
Students with Disabilities 8,840 696.57 29.73 650 850 
Language Form Spanish 149 688.30 21.36 650 773 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.62 Subgroup Performance for Mathematics Scale Scores: Algebra I 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Eur Summa 84,749 735.17 34.55 650 850 
Score 
Female 41,214 736.26 33.63 650 850 
onder Male 43,535 734.14 35.36 650 850 
jnetieal  iauinyerne 200 731.53 2935 650 835 
Native 
Asian 5,384 764.06 37.11 650 850 
Black/African American 31,315 720.80 26.90 650 850 
Ethnicity Hispanic/Latino 16,225 721.86 28.59 650 850 
Native Hawaiian/Pacific 176 743.73 28.06 678 820 
Islander 
Two or more races 3,650 743.40 33.84 650 850 
White 27,641 752.43 33.38 650 850 
we eticen! 50,459 745.57 35.41 —-650 850 
; Disadvantaged 
Economic Status* Economically 
: 34,201 719.90 26.68 650 850 
Disadvantaged 
English Learner Non English Learner 74,256 738.14 34.30 650 850 
Status English Learner 7,253 708.67 23.32 650 850 
Disabilities Students without Disabilities 69,867 738.95 34.39 650 850 
Students with Disabilities 14,882 717.41 29.35 650 850 
Language Form Spanish 705 704.80 22.37 650 805 
Note: This table is identical to Table 12.10 in Section 12. *Economic status was based on 
participation in National School Lunch Program (NSLP): receipt of free or reduced-price lunch 
(FRL). 
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Table A.12.63 Subgroup Performance for Mathematics Scale Scores: Geometry 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
eur ounesiNe 13,390 746.49 31.57 650 850 
Score 
Female 6,490 746.40 30.50 650 850 
Gender 
Male 6,900 746.58 32.54 650 850 
ene maaan 30 736.43 3097 671 784 
Native 
Asian 2,045 771.69 25.46 674 850 
Black/African American 3,936 724.76 27.27 650 815 
Ethnicity Hispanic/Latino 1,728 732.62 27.94 650 813 
Native Hawaiian/Pacific 80 740.40 21.98 682 784 
Islander 
Two or more races 755 751.96 26.19 656 832 
White 4,674 758.44 24.85 650 850 
we eticen! 9,435 756.42 27.77 650 850 
: , Disadvantaged 
Economic Status eeenonicall 
cari | 3,955 722.80 27.16 650 815 
Disadvantaged 
English Learner Non English Learner 9,383 757.06 26.68 650 850 
Status English Learner 593 715.86 25.96 650 806 
Disabilities Students without Disabilities 11,869 749.40 30.12 650 850 
Students with Disabilities 1,521 723.78 33.38 650 829 
Language Form Spanish 59 707.34 20.02 660 777 
Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table A.12.64 Subgroup Performance for Mathematics Scale Scores: Algebra II 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
pul suimimalive 5,129 749.13 41.51 650 850 
Score 
Female 2,574 746.60 40.39 650 850 
Gender 
Male 2,555 751.69 42.47 650 850 
American Indian/Alaska 
; n/r n/r n/r n/r n/r 
Native 
Asian 860 774.40 39.77 656 850 
Black/African American 585 726.26 37.63 650 850 
Ethnicity Hispanic/Latino 693 724.56 34.33 650 823 
Native Hawaiian/Pacific 72 720.51 31.67 656 831 
Islander 
Two or more races 457 743.35 39.41 650 850 
White 2,376 755.42 38.09 650 850 
not conomlpeny 4,895 749.48 4154 650 850 
; Disadvantaged 
Economic Status* Eeananical 
plead 234 741.82 40.39 650 850 
Disadvantaged 
English Learner Non English Learner 4,805 750.51 40.97 650 850 
Status English Learner 139 706.79 31.19 650 791 
Disabilities Students without Disabilities 4,760 750.97 40.59 650 850 
Students with Disabilities 369 725.46 45.80 650 850 
Language Form Spanish n/r n/r n/r n/r n/r 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Table A.12.65 Subgroup Performance for Mathematics Scale Scores: Integrated Mathematics I 


Group Type Group N Mean SD Min Max 
PoP Suminalve 144 729.53 27.16 662 816 
Score 
aera Female 84 730.30 26.85 678 816 
lea Male 60 728.45 27.78 662 809 
American Indian/Alaska 
; n/r n/r n/r n/r n/r 
Native 
Asian n/r n/r n/r n/r n/r 
Black/African American 72 725.64 25.31 662 785 
Ethnicity Hispanic/Latino 54 729.02 28.08 678 816 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or More Races n/r n/r n/r n/r n/r 
White n/r n/r n/r n/r n/r 
er! 46 737.41 28.08 662 809 
; Disadvantaged 
Economic Status* Eeancnnicall 
cari | 98 725.3 26.05 678 816 
Disadvantaged 
English Learner Non English Learner n/r n/r n/r n/r n/r 
Status English Learner nr n/r nr n/r n/r 
Disabilities Students without Disabilities 110 731.85 24.65 678 816 
Students with Disabilities 34 722.03 33.38 662 809 
Language Form Spanish n/r n/r n/r n/r n/r 


Note: This table is identical to Table 12.11 in Section 12. *Economic status was based on 
participation in National School Lunch Program (NSLP): receipt of free or reduced-price lunch 
(FRL). n/r = not reported due to n<20. 


New Meridian 


February 28, 2020 


Page 305 


2019 Technical Report 


Table A.12.66 Subgroup Performance for Mathematics Scale Scores: Integrated Mathematics II 


Group Type Group N Mean SD Min Max 
Bul suinniative 190 740.24 32.06 678 818 
Score 
Génd Female 88 735.80 27.65 678 805 
one Male 102 744.08 35.11 678 818 
American Indian/Alaska 
: n/r n/r n/r n/r n/r 
Native 
Asian n/r n/r n/r n/r n/r 
Black/African American 56 733.63 27.22 678 779 
Ethnicity Hispanic/Latino 61 726.20 28.93 678 805 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or more races n/r n/r n/r n/r n/r 
White 48 757.69 30.97 692 818 
per cenomeany 108 751.21 31.21 685 818 
. Disadvantaged 
Economic Status* eeeqanicall 
- y 82 725.79 27.20 678 805 
Disadvantaged 
English Learner Non English Learner n/r n/r n/r n/r n/r 
Status English Learner n/r n/r n/r nr n/r 
Disabilities Students without Disabilities 153 743.33 31.33 685 818 
Students with Disabilities 37 727.46 32.32 678 782 
Language Form Spanish n/r n/r n/r n/r n/r 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Table A.12.67 Subgroup Performance for Mathematics Scale Scores: Integrated Mathematics II] 


Group Type Group N Mean SD Min Max 
Bu Suinmtalive n/r n/r n/r n/r n/r 
Score 
eand Female n/r n/r n/r n/r n/r 
nee Male n/r n/r n/r n/r n/r 
American Indian/Alaska 
; n/r n/r n/r n/r n/r 
Native 
Asian n/r n/r n/r n/r n/r 
Black/African American n/r n/r n/r n/r n/r 
Ethnicity Hispanic/Latino n/r nr n/r n/r n/r 
Native Hawaiian/Pacific 
n/r n/r n/r n/r n/r 
Islander 
Two or More Races n/r n/r n/r n/r n/r 
White n/r n/r n/r n/r n/r 
bed epost) n/r n/r n/r nr n/r 
. Disadvantaged 
Economic Status* : 
Economically n/r n/r n/r n/r n/r 
Disadvantaged 
English Learner Non English Learner n/r n/r n/r n/r n/r 
Status English Learner n/r n/r n/r n/r n/r 
Disabilities Students without Disabilities n/r n/r n/r n/r n/r 
Students with Disabilities n/r n/r n/r n/r n/r 
Language Form Spanish n/r n/r n/r n/r n/r 
Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Appendix 13.1: Test Reliability Estimates by Subgroups by Grade/Subject 


Table A.13.1 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 3 


Ha Avg. Avg. Minimum Reliability Maximum Reliability 
Sas SEM Reliability N Alpha N Alpha 
Total Group 82 = 4.38 0.92 1,694 0.82 34,846 0.92 
Gender 
Male 82 4.28 0.92 1,101 0.81 17,612 0.92 
Female 82 4.49 0.92 593 0.83 17,234 0.92 
Ethnicity 
White 82 4.51 0.91 388 0.87 12,072 0.91 
Black/African American 82 4.25 0.90 831 0.72 12,468 0.91 
Asian/Pacific Islander 82 4.69 0.90 2,097 0.90 2,184 0.91 
American Indian/ Alaska Native nr n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 82 4.26 0.91 140 0.80 6,353 0.91 
Multiple 82 4.52 0.92 1,584 0.91 1,713 0.92 
Special Instruction Needs 
Economically Disadvantaged 82 4.20 0.90 1,237 0.75 16,535 0.91 
a 82 4.53 091 453 0.88 18,224 0.91 
English Learner 82 4.10 0.88 349 0.74 4,768 0.89 
Non-English Learner 82 4.44 0.92 1,099 0.83 27,561 0.92 
Students with Disabilities 82 3.83 0.90 1,694 0.82 4,175 0.92 
Students without Disabilities 82 4.47 0.92 30,293 0.92 30,671 0.92 
Students Taking 
Accommodated Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 82 3.21 0.80 1,639 0.80 1,639 0.80 


n/r = not reported due to n<100. 
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Table A.13.2 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 4 


n/r = not reported due to n<100. 


New Meridian February 28, 2020 


eae Avg. Avg. Minimum Reliability Maximum Reliability 
Scare SEM Reliability N Alpha N Alpha 
Total Group 106 5.36 0.92 2,099 0.83 31,699 0.92 
Gender 
Male 106 5.23 0.91 1,367 0.81 15,940 0.92 
Female 106 5.48 0.92 732 0.84 15,759 0.92 
Ethnicity 
White 106 5.47 0.91 162 0.87 10,970 0.91 
Black/African American 106 5.22 0.90 1,094 0.74 11,183 0.91 
Asian/Pacific Islander 106 5.56 0.91 1,982 0.90 2,416 0.91 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 106 5.28 0.90 468 0.68 5,878 0.91 
Multiple 106 5.53 0.91 1,511 0.91 1,791 0.92 
Special Instruction Needs 
Economically Disadvantaged 106 5.22 0.89 1,500 0.76 18,244 0.90 
Not Economically Disadvantaged 106 5.48 0.91 193 0.86 16,691 0.91 
English Learner 106 5.07 0.85 151 0.69 3,945 0.86 
Non-English Learner 106 5.40 0.92 1,421 0.84 25,642 0.92 
Students with Disabilities 106 4.76 0.90 2,099 0.83 4,155 0.92 
Students without Disabilities 106 5.47 0.91 33,355 0.91 27,943 0.91 
Students Taking Accommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 106 3.92 0.79 2,044 0.79 2,044 0.79 
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Table A.13.3 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 5 


Avg. Avg. Minimum Reliability Maximum Reliability 

Sore SEM Reliability N Alpha N Alpha 
Total Group 106 5.40 0.92 531 0.82 30,670 0.93 
Gender 
Male 106 5.27 0.92 367 0.82 15,535 0.93 
Female 106 5.50 0.92 738 0.82 15,135 0.92 
Ethnicity 
White 106 5.45 0.91 139 0.86 14,018 0.92 
Black/African American 106 5.28 0.90 1,128 0.78 10,974 0.91 
Asian/Pacific Islander 106 5.51 0.91 1,985 0.91 2,609 0.91 
American Indian/Alaska Native 106 5.19 0.93 140 0.93 140 0.93 
Hispanic/Latino 106 5.32 0.91 468 0.74 5,516 0.92 
Multiple 106 5.52 0.92 1,863 0.92 1,403 0.92 
Special Instruction Needs 
Economically Disadvantaged 106 5.26 0.90 328 0.75 14,211 0.91 
Not Economically Disadvantaged 106 5.47 0.92 199 0.88 21,665 0.92 
English Learner 106 4.75 0.85 428 0.71 3,468 0.87 
Non-English Learner 106 5.44 0.92 347 0.85 26,080 0.92 
Students with Disabilities 106 4.81 0.90 526 0.82 4,170 0.93 
Students without Disabilities 106 5.49 0.92 35,137 0.92 26,500 0.92 
‘Students TakingAccommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 106 3.89 0.81 2,044 0.81 2,044 0.81 


n/r = not reported due to n<100. 
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Table A.13.4 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 6 


see Avg. Avg. Minimum Reliability Maximum Reliability 
Segie SEM Reliability N Alpha N Alpha 
Total Group 109 5.50 0.93 1,967 0.86 37,982 0.94 
Gender 
Male 109 5.32 0.93 1,291 0.85 19,298 0.94 
Female 109 5.67 0.93 676 0.88 18,684 0.93 
Ethnicity 
White 109 5.55 0.93 128 0.90 13,437 0.93 
Black/African American 109 5.41 0.92 1,036 0.77 12,840 0.93 
Asian/Pacific Islander 109 5.58 0.93 2,417 0.93 2,389 0.93 
American Indian/Alaska Native 109 5.49 0.93 109 0.93 104 0.93 
Hispanic/Latino 109 5.47 0.92 455 0.83 7,072 0.93 
Multiple 109 5.59 0.93 1,849 0.93 1,905 0.93 
Special Instruction Needs 
Economically Disadvantaged 109 5.37 0.91 1,373 0.77 16,260 0.92 
Not Economically Disadvantaged 109 5.59 0.93 147 0.91 21,721 0.93 
English Learner 109 4.74 0.87 376 0.71 2,342 0.88 
Non-English Learner 109 5.54 0.93 1,371 0.88 33,412 0.94 
Students with Disabilities 109 4.84 0.92 1,967 0.86 5,364 0.94 
Students without Disabilities 109 5.62 0.93 32,525 0.93 32,618 0.93 
‘Students Taking Accommodated i “ati (it”t”t~<—s~s—‘—s‘“‘—i—iti‘sS 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 3.89 0.80 1,888 0.80 1,888 0.80 
ir=notreporteddueton<100. 
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Table A.13.5 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 7 


| Avg. Avg. Minimum Reliability Maximum Reliability 

Scars SEM Reliability N Alpha N Alpha 
Total Group 109 5.92 0.93 1,572 0.84 36,260 0.94 
Gender 
Male 109 5.74 0.93 1,019 0.85 18,546 0.94 
Female 109 6.09 0.93 553 0.83 17,714 0.93 
Ethnicity 
White 109 5.93 0.93 285 0.91 13,123 0.93 
Black/African American 109 5.87 0.92 814 0.78 12,275 0.93 
Asian/Pacific Islander 109 5.96 0.93 2,467 0.93 2,267 0.93 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 109 5.90 0.92 382 0.79 6,374 0.93 
Multiple 109 6.01 0.93 1,738 0.93 1,740 0.93 
Special Instruction Needs 
Economically Disadvantaged 109 5.83 0.92 1,097 0.77 14,898 0.93 
Not Economically Disadvantaged 109 5.97 0.93 471 0.90 21,362 0.93 
English Learner 109 5.26 0.85 318 0.69 1,962 0.89 
Non-English Learner 109 5.95 0.93 1,070 0.85 32,262 0.93 
Students with Disabilities 109 5.32 0.92 1,572 0.84 5,211 0.93 
Students without Disabilities 109 6.02 0.93 30,998 0.93 31,049 0.93 
‘Students Taking Accommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.23 0.80 1,526 0.80 1,526 0.80 


n/r = not reported due to n<100. 
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Table A.13.6 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 8 


ee Avg. Avg. Minimum Reliability Maximum Reliability 

Seni SEM Reliability N Alpha N Alpha 
Total Group 109 5.62 0.94 1,268 0.91 35,079 0.94 
Gender 
Male 109 5.49 0.94 823 0.89 17,701 0.94 
Female 109 5.75 0.94 445 0.93 17,378 0.94 
Ethnicity 
White 109 5.67 0.93 13,089 0.93 269 0.95 
Black/African American 109 5.55 0.92 683 0.82 11,894 0.93 
Asian/Pacific Islander 109 5.66 0.93 2,363 0.93 2,423 0.93 
American Indian/Alaska Native 109 5.60 0.93 103 0.93 103 0.93 
Hispanic/Latino 109 5.56 0.93 257 0.82 6,153 0.93 
Multiple 109 5.78 0.93 1,642 0.93 1,572 0.94 
Special Instruction Needs 
Economically Disadvantaged 109 5.46 0.92 859 0.82 14,043 0.93 
Not Economically Disadvantaged 109 5.72 0.93 21,517 0.93 402 0.95 
English Learner 109 4.73 0.87 209 0.64 1,971 0.88 
Non-English Learner 109 5.66 0.94 923 0.92 31,290 0.94 
Students with Disabilities 109 5.00 0.93 1,268 0.91 5,150 0.94 
Students without Disabilities 109 5.73 0.94 30,394 0.93 29,998 0.94 
‘Students TakingAccommodated ss s—<—ssSSSSSSSSS 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 3.86 0.83 1,187 0.83 1,187 0.83 


n/r = not reported due to n<100. 
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Table A.13.7 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 9 


ee Avg. Avg. Minimum Reliability Maximum Reliability 

Bie SEM Reliability N Alpha N Alpha 
Total Group 109 5.56 0.93 109 0.72 1,860 0.94 
Gender 
Male 109 5.28 0.95 621 0.94 888 0.95 
Female 109 5.89 0.94 972 0.94 607 0.94 
Ethnicity 
White 109 5.95 0.92 184 0.91 140 0.92 
Black/African American 109 5.53 0.92 821 0.92 1,303 0.92 
Asian/Pacific Islander n/r n/r n/r n/r n/r n/r n/r 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 109 5.50 0.94 217 0.93 313 0.95 
Multiple n/r n/r n/r n/r n/r n/r n/r 
Special Instruction Needs 
Economically Disadvantaged 109 5.40 0.90 101 0.62 1,421 0.92 
Not Economically Disadvantaged 109 6.04 0.93 438 0.93 316 0.94 
English Learner 109 4.70 0.93 107 0.92 150 0.94 
Non-English Learner n/r n/r n/r n/r n/r n/r n/r 
Students with Disabilities 109 4.75 0.90 109 0.72 361 0.94 
Students without Disabilities 109 5.78 0.94 966 0.94 1,499 0.94 
‘Students Taking Accommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.01 0.72 109 0.72 109 0.72 


n/r = not reported due to n<100. 
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Table A.13.8 Summary of Test Reliability Estimates for Subgroups: ELA/L Grade 10 


pe Avg. Avg. Minimum Reliability Maximum Reliability 

ee SEM Reliability N Alpha N Alpha 
Total Group 109 5.83 0.94 146 0.89 39,231 0.94 
Gender 
Male 109 5.70 0.94 397 0.88 15,133 0.94 
Female 109 5.95 0.94 219 0.91 19,440 0.94 
Ethnicity 
White 109 5.90 0.93 172 0.93 13,818 0.93 
Black/African American 109 5.75 0.92 325 0.79 10,358 0.93 
Asian/Pacific Islander 109 5.94 0.92 2,643 0.92 2,065 0.92 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 109 5.69 0.93 6,634 0.93 5,185 0.93 
Multiple 109 6.02 0.93 1,728 0.93 1,299 0.93 
Special Instruction Needs 
Economically Disadvantaged 109 5.69 0.92 319 0.80 10,993 0.93 
Not Economically Disadvantaged 109 5.91 0.93 282 0.92 18,623 0.93 
English Learner 109 4.84 0.85 2,699 0.84 2,087 0.86 
Non-English Learner 109 5.89 0.93 142 0.89 34,667 0.93 
Students with Disabilities 109 5.35 0.93 616 0.90 4,572 0.94 
Students without Disabilities 109 5.91 0.93 25,044 0.93 33,203 0.93 
‘Students Taking Accommodated es—s—‘—s—s—sSsSS 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.36 0.86 560 0.86 560 0.86 


n/r = not reported due to n<100. 
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Table A.13.9 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 3 


Max. 
Raw 
Score 
Total Group 66 
Gender 
Male 66 
Female 66 
Ethnicity 
White 66 
Black/African American 66 
Asian/Pacific Islander 66 
American Indian/Alaska Native n/r 
Hispanic/Latino 66 
Multiple 66 
Special Instruction Needs 
Economically Disadvantaged 66 
Not Economically Disadvantaged 66 
English Learner 66 
Non-English Learner 66 
Students with Disabilities 66 
Students without Disabilities 66 
Students Taking Accommodated 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 66 
Students Taking Translated 
Forms 
Spanish Language Form 66 


n/r = not reported due to n<100. 


New Meridian 


Avg. 
SEM Reliability 
3.54 0.95 
3.53 0.95 
3.55 0.94 
3.62 0.93 
3.41 0.94 
3.53 0.93 
n/r n/r 
3.47 0.94 
3.59 0.94 
3.41 0.94 
3.60 0.94 
3.39 0.93 
3.56 0.94 
3.33 0.94 
3.57 0.94 
n/r n/r 
n/r n/r 
n/r n/r 
3.30 0.93 
3.13 0.95 


February 28, 2020 


Avg. Minimum Reliability Maximum Reliability 


N 


779 


513 
266 


199 
275 
2,217 
n/r 
246 
1,799 


463 
311 
260 
451 
687 
30,006 


n/r 
n/r 
n/r 
7,150 


122 


Alpha 


0.90 


0.91 
0.88 


0.91 
0.89 
0.93 

n/r 
0.88 
0.94 


0.89 
0.91 
0.89 
0.90 
0.89 
0.94 


n/r 
n/r 
n/r 
0.93 


0.95 


N 


122 


18,111 
16,881 


12,396 
11,488 
2,330 
n/r 

121 
1,879 


15,281 
19,841 
122 
27,605 
5,058 
106 


n/r 
n/r 
n/r 
7,192 


122 


Alpha 


0.95 


0.95 
0.94 


0.93 
0.94 
0.93 

n/r 
0.95 
0.94 


0.94 
0.94 
0.95 
0.95 
0.94 
0.96 


n/r 
n/r 
n/r 
0.93 


0.95 
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Table A.13.10 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 4 


Max. 
Raw 
Score 
Total Group 66 
Gender 
Male 66 
Female 66 
Ethnicity 
White 66 
Black/African American 66 
Asian/Pacific Islander 66 
American Indian/Alaska Native n/r 
Hispanic/Latino 66 
Multiple 66 
Special Instruction Needs 
Economically Disadvantaged 66 
Not Economically Disadvantaged 66 
English Learner 66 
Non-English Learner 66 
Students with Disabilities 66 
Students without Disabilities 66 
Students Taking Accommodated 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 66 
Students Taking Translated Forms 
Spanish Language Form 66 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.67 0.95 
3.63 0.95 
3.69 0.94 
3.78 0.93 
3.54 0.94 
3.66 0.94 
n/r n/r 
3.60 0.94 
3.73 0.94 
3.54 0.94 
3.75 0.94 
3.47 0.92 
3.70 0.95 
3.33 0.94 
3.73 0.94 
n/r n/r 
n/r n/r 
n/r n/r 
3.37 0.94 
2.71 0.91 


February 28, 2020 


N 


799 


539 
260 


216 
281 
2,343 
n/r 
241 
1,945 


484 
308 
250 
500 
702 
135 


n/r 
n/r 
n/r 
7,965 


142 


Alpha 


0.90 


0.90 
0.90 


0.91 
0.90 
0.93 

n/r 
0.89 
0.94 


0.89 
0.91 
0.86 
0.92 
0.89 
0.91 


n/r 
n/r 
n/r 
0.93 


0.91 


N 


38,354 


19,542 
18,812 


13,232 
12,894 
2,384 
n/r 
7,168 
2,061 


16,695 
21,338 
4,903 
30,704 
6,003 
32,351 


n/r 
n/r 
n/r 
7,639 


142 


Alpha 


0.95 


0.95 
0.95 


0.94 
0.94 
0.94 

n/r 
0.94 
0.94 


0.94 
0.94 
0.93 
0.95 
0.95 
0.94 


n/r 
n/r 
n/r 
0.94 


0.91 
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Table A.13.11 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 5 


Max. 
Raw 
Score 

Total Group 66 
Gender 
Male 66 
Female 66 
Ethnicity 
White 66 
Black/African American 66 
Asian/Pacific Islander 66 
American Indian/Alaska Native 66 
Hispanic/Latino 66 
Multiple 66 
Special Instruction Needs 
Economically Disadvantaged 66 
Not Economically Disadvantaged 66 
English Learner 66 
Non-English Learner 66 
Students with Disabilities 66 
Students without Disabilities 66 
‘Students Taking Accommodated =” 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 66 
‘Students Taking Translated =” 
Forms 
Spanish Language Form 66 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability 


Avg. 
SEM Reliability 
3.64 0.94 
3.59 0.95 
3.70 0.94 
3.73 0.94 
3.53 0.92 
3.62 0.94 
3.70 0.93 
3.59 0.93 
3.70 0.94 
3.52 0.92 
3.71 0.94 
3.32 0.90 
3.67 0.94 
3.36 0.93 
3.69 0.94 
n/r n/r 
n/r n/r 
n/r n/r 
3.34 0.91 
2.90 0.91 


February 28, 2020 


N 


142 


454 
240 


190 
241 
2,372 
122 
209 
1,890 


406 
284 
194 
448 
129 
134 


n/r 
n/r 
n/r 
6,999 


148 


Alpha 


0.86 


0.89 
0.85 


0.89 
0.89 
0.94 
0.93 
0.78 
0.94 


0.85 
0.90 
0.81 
0.89 
0.86 
0.91 


n/r 
n/r 
n/r 
0.90 


0.91 


Maximum Reliability 


N 


37,673 


19,488 
18,185 


13,109 
12,927 
2,465 
139 
6,919 
1,919 


16,461 
21,210 
3,585 
31,818 
6,138 
31,535 


n/r 
n/r 
n/r 
6,995 


148 


Alpha 


0.95 


0.95 
0.94 


0.94 
0.93 
0.94 
0.94 
0.93 
0.95 


0.93 
0.94 
0.91 
0.95 
0.94 
0.95 


n/r 
n/r 
n/r 
0.92 


0.91 
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Table A.13.12 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 6 


—. Avg. Avg. Minimum Reliability Maximum Reliability 

Sioie SEM Reliability N Alpha N Alpha 
Total Group 66 3.41 0.95 105 0.84 37,020 0.95 
Gender 
Male 66 3.35 0.95 228 0.88 18,917 0.95 
Female 66 3.46 0.94 123 0.84 18,103 0.95 
Ethnicity 
White 66 3.70 0.94 126 0.91 13,273 0.94 
Black/African American 66 3.03 0.92 148 0.62 12,423 0.92 
Asian/Pacific Islander 66 3.84 0.95 2,323 0.94 2,306 0.95 
American Indian/Alaska Native 66 3.25 0.94 109 0.94 109 0.94 
Hispanic/Latino 66 3.19 0.93 104 0.84 7,096 0.93 
Multiple 66 3.56 0.95 1,839 0.94 1,808 0.95 
Special Instruction Needs 
Economically Disadvantaged 66 3.02 0.92 190 0.75 16,233 0.92 
Not Economically Disadvantaged 66 3.66 0.94 153 0.89 21,148 0.95 
English Learner 66 2.62 0.89 105 0.84 2,542 0.90 
Non-English Learner 66 3.46 0.95 272 0.88 32,237 0.95 
Students with Disabilities 66 2.86 0.93 344 0.87 6,000 0.94 
Students without Disabilities 66 3.49 0.94 104 0.84 31,020 0.95 
‘Students TakingAccommodated eses—s—<‘—sSsSsSsSS 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 66 2.72 0.91 4,427 0.90 4,511 0.92 
‘Students TakingTranslated 
Forms 
Spanish Language Form 66 2.67 0.84 105 0.84 105 0.84 


n/r = not reported due to n<100. 
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Table A.13.13 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 7 


Max. 
Raw 
Score 

Total Group 66 
Gender 
Male 66 
Female 66 
Ethnicity 
White 66 
Black/African American 66 
Asian/Pacific Islander 66 
American Indian/Alaska Native n/r 
Hispanic/Latino 66 
Multiple 66 
Special Instruction Needs 
Economically Disadvantaged 66 
Not Economically Disadvantaged 66 
English Learner 66 
Non-English Learner 66 
Students with Disabilities 66 
Students without Disabilities 66 
‘Students Taking Accommodated 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 66 
‘Students Taking Translated” 
Forms 
Spanish Language Form 66 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability 


Avg. 
SEM Reliability 
3.42 0.93 
3.36 0.94 
3.48 0.93 
3.67 0.93 
3.17 0.90 
3.76 0.93 
n/r n/r 
3.23 0.91 
3.52 0.93 
3.16 0.90 
3.63 0.93 
2.73 0.86 
3.47 0.93 
2.96 0.91 
3.51 0.93 
n/r n/r 
n/r n/r 
n/r n/r 
2.92 0.91 
2.56 0.75 


N 


122 


154 
14,558 


10,225 
101 
1,274 
n/r 

122 
1,198 


14,225 
132 
122 
188 
216 
116 


n/r 
n/r 
n/r 
3,527 


122 


February 28, 2020 


Alpha 


0.75 


0.90 
0.93 


0.93 
0.66 
0.93 

n/r 
0.75 
0.92 


0.90 
0.90 
0.75 
0.89 
0.89 
0.75 


n/r 
n/r 
n/r 
0.91 


0.75 


Maximum Reliability 


N 


29,514 


15,176 
14,338 


10,132 
11,116 
1,275 
n/r 
5,625 
1,188 


14,238 
15,273 
2,039 
25,583 
5,309 
24,205 


n/r 
n/r 
n/r 
3,634 


122 


Alpha 


0.93 


0.94 
0.93 


0.93 
0.91 
0.93 

n/r 
0.91 
0.93 


0.91 
0.93 
0.88 
0.93 
0.92 
0.93 


n/r 
n/r 
n/r 
0.92 


0.75 
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Table A.13.14 Summary of Test Reliability Estimates for Subgroups: Mathematics Grade 8 


Max. 
Raw 
Score 
Total Group 66 
Gender 
Male 66 
Female 66 
Ethnicity 
White 66 
Black/African American 66 
Asian/Pacific Islander 66 
American Indian/Alaska Native n/r 
Hispanic/Latino 66 
Multiple 66 
Special Instruction Needs 
Economically Disadvantaged 66 
Not Economically Disadvantaged 66 
English Learner 66 
Non-English Learner 66 
Students with Disabilities 66 
Students without Disabilities 66 
Students Taking Accommodated 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 66 
Students Taking Translated 
Forms 
Spanish Language Form 66 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
2.90 0.88 
2.83 0.89 
2.98 0.88 
3.17 0.89 
2.13 0.84 
3.20 0.91 
n/r n/r 
2.16 0.85 
2.99 0.88 
2.13 0.85 
3.09 0.89 
2.45 0.79 
2.93 0.88 
2.60 0.86 
2.98 0.88 
n/r n/r 
n/r n/r 
n/r n/r 
2.56 0.83 
2.38 0.68 


February 28, 2020 


N 


191 


130 
8,644 


5,365 
106 
435 

n/r 
142 
658 


10,038 
8,112 
142 
157 
190 
139 


n/r 
n/r 
n/r 
2,738 


142 


Alpha 


0.67 


0.74 
0.86 


0.88 
0.36 
0.90 

n/r 
0.68 
0.87 


0.84 
0.88 
0.68 
0.69 
0.67 
0.68 


n/r 
n/r 
n/r 
0.81 


0.68 


N 


19,437 


10,225 
9,212 


5,736 
8,709 
505 
n/r 
3,713 
671 


10,672 
8,759 
1,669 

16,236 
4,355 

15,082 


n/r 
n/r 
n/r 
2,603 


142 


Alpha 


0.89 


0.90 
0.89 


0.90 
0.86 
0.92 

n/r 
0.87 
0.89 


0.87 
0.90 
0.82 
0.89 
0.86 
0.89 


n/r 
n/r 
n/r 
0.85 


0.68 
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Total Group 

Gender 

Male 

Female 

Ethnicity 

White 

Black/African American 
Asian/Pacific Islander 
American Indian/Alaska Native 
Hispanic/Latino 

Multiple 

Special Instruction Needs 
Economically Disadvantaged 
Not Economically Disadvantaged 
English Learner 

Non-English Learner 

Students with Disabilities 


Students without Disabilities 


Students Taking Accommodated 
Forms 


ASL 

Closed-Caption 

Screen Reader 

Text-to-Speech 

Students Taking Translated Forms 
Spanish Language Form 

n/r = not reported due to n<100. 


New Meridian 


Max. 
Raw 
Score 


81 


81 
81 


81 
81 
81 
n/r 
81 
81 


81 
81 
81 
81 
81 
81 


n/r 
n/r 
n/r 
80 


81 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.48 0.93 
3.45 0.94 
3.51 0.93 
3.79 0.93 
3.09 0.88 
3.96 0.95 
n/r n/r 
3.16 0.89 
3.62 0.93 
3.07 0.87 
3.69 0.94 
2.77 0.83 
3.53 0.94 
3.05 0.91 
3.56 0.93 
n/r n/r 
n/r n/r 
n/r n/r 
2.89 0.84 
2.13 0.66 


February 28, 2020 


N 


175 


243 
178 


11,729 
10,559 
2,084 
n/r 

172 
1,448 


252 
111 
175 
225 
246 
173 


n/r 
n/r 
n/r 
2,090 


175 


Alpha 


0.50 


0.88 
0.87 


0.93 
0.87 
0.95 

n/r 
0.51 
0.93 


0.77 
0.51 
0.50 
0.90 
0.90 
0.50 


n/r 
n/r 
n/r 
0.82 


0.50 


N 


34,458 


17,647 
16,811 


12,650 
11,795 
2,376 
n/r 
5,812 
1,600 


13,129 
21,326 
2,240 
30,895 
5,920 
28,538 


n/r 
n/r 
n/r 
2,061 


282 


Alpha 


0.94 


0.94 
0.94 


0.93 
0.88 
0.95 

n/r 
0.91 
0.94 


0.88 
0.94 
0.85 
0.94 
0.92 
0.94 


n/r 
n/r 
n/r 
0.87 


0.76 
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Total Group 

Gender 

Male 

Female 

Ethnicity 

White 

Black/African American 
Asian/Pacific Islander 
American Indian/Alaska Native 
Hispanic/Latino 

Multiple 

Special Instruction Needs 
Economically Disadvantaged 
Not Economically Disadvantaged 
English Learner 

Non-English Learner 

Students with Disabilities 


Students without Disabilities 


Students Taking Accommodated 
Forms 


ASL 

Closed-Caption 

Screen Reader 

Text-to-Speech 

Students Taking Translated Forms 
Spanish Language Form 

n/r = not reported due to n<100. 


New Meridian 


Max. 
Raw 
Score 


81 


81 
81 


81 
81 
81 
n/r 
81 
81 


81 
81 
81 
81 
81 
81 


n/r 
n/r 
n/r 
81 


n/r 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.83 0.95 
3.82 0.95 
3.83 0.95 
4.04 0.93 
3.25 0.93 
4.17 0.94 
n/r n/r 
3.49 0.93 
3.92 0.94 
3.21 0.93 
3.99 0.94 
3.03 0.92 
4.01 0.94 
3.26 0.96 
3.88 0.95 
n/r n/r 
n/r n/r 
n/r n/r 
2.70 0.85 
n/r n/r 


February 28, 2020 


N 


5,749 


2,978 
2,771 


2,189 
1,457 
983 
n/r 
674 
342 


1,409 
4,340 
254 
4,322 
687 
5,125 


n/r 
n/r 
n/r 
165 


n/r 


Alpha 


0.95 


0.95 
0.94 


0.93 
0.92 
0.93 

n/r 
0.93 
0.94 


0.92 
0.94 
0.90 
0.94 
0.96 
0.94 


n/r 
n/r 
n/r 
0.84 


n/r 


N 


6,254 


3,192 
3,062 


2,250 
1,782 
975 
n/r 
769 
351 


1,801 
4,453 
196 
4,455 
624 
5,567 


n/r 
n/r 
n/r 
149 


n/r 


Alpha 


0.95 


0.96 
0.95 


0.94 
0.93 
0.95 

n/r 
0.94 
0.94 


0.93 
0.95 
0.93 
0.95 
0.96 
0.95 


n/r 
n/r 
n/r 
0.85 


n/r 
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Total Group 

Gender 

Male 

Female 

Ethnicity 

White 

Black/African American 
Asian/Pacific Islander 
American Indian/Alaska Native 
Hispanic/Latino 

Multiple 

Special Instruction Needs 
Economically Disadvantaged 
Not Economically Disadvantaged 
English Learner 

Non-English Learner 

Students with Disabilities 
Students without Disabilities 


Students Taking Accommodated 
Forms 


ASL 

Closed-Caption 

Screen Reader 

Text-to-Speech 

Students Taking Translated Forms 
Spanish Language Form 

n/r = not reported due to n<100. 


New Meridian 


Max. 
Raw 
Score 


81 


81 
81 


81 
81 
81 
n/r 
81 
81 


81 
81 
n/r 
81 
81 
81 


n/r 
n/r 
n/r 


n/r 


n/r 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.93 0.94 
3.96 0.94 
3.89 0.94 
4.00 0.93 
3.58 0.92 
4.13 0.93 
n/r n/r 
3.62 0.91 
3.84 0.93 
3.84 0.93 
3.93 0.94 
n/r n/r 
3.95 0.94 
3.53 0.95 
3.96 0.94 
n/r n/r 
n/r n/r 
n/r n/r 
n/r n/r 
n/r n/r 


February 28, 2020 


N 


2,213 


1,100 
1,113 


1,046 
240 
368 

n/r 
274 
211 


108 
2,105 
n/r 
2,087 
137 
2,076 


n/r 
n/r 
n/r 


n/r 


n/r 


Alpha 


0.94 


0.94 
0.93 


0.92 
0.91 
0.93 

n/r 
0.90 
0.93 


0.93 
0.94 

n/r 
0.94 
0.95 
0.93 


n/r 
n/r 
n/r 


n/r 


n/r 


N 


2,002 


993 
1,009 


967 
210 
372 

n/r 
255 
149 


108 
1,921 
n/r 
1,898 
144 
1,858 


n/r 
n/r 
n/r 


n/r 


n/r 


Alpha 


0.94 


0.94 
0.94 


0.93 
0.94 
0.94 

n/r 
0.91 
0.93 


0.93 
0.94 

n/r 
0.94 
0.95 
0.94 


n/r 
n/r 
n/r 


n/r 


n/r 
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Appendix 13.2: Reliability of Classification by Content and Grade/Subject 


Table A.13.18 Reliability of Classification: Grade 3 ELA/L 


Full 


Summative Level1 Level 2 Level3 Level4 Level 5 


Scale Score 
650-699 
700-724 
725-749 
750-809 
810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Accuracy 


Decision 
Consistency 


Table A.13.19 Reliability of Classification: Grade 4 ELA/L 


Full 
Summative 
Scale Score 


650-699 
700-724 
725-749 
750-809 
810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Accuracy 


Decision 
Consistency 


0.18 
0.03 
0.00 
0.00 
0.00 
0.17 
0.04 
0.00 
0.00 
0.00 


Level 1 Level 2 Level 3 Level 4 Level 5 


0.12 
0.03 
0.00 
0.00 
0.00 
0.12 
0.03 
0.00 
0.00 
0.00 


0.03 
0.11 
0.04 
0.00 
0.00 
0.04 
0.09 
0.04 
0.01 
0.00 


0.02 
0.12 
0.04 
0.00 
0.00 
0.03 
0.10 
0.05 
0.00 
0.00 


0.00 
0.04 
0.12 
0.04 
0.00 
0.00 
0.05 
0.10 
0.05 
0.00 


0.00 
0.04 
0.16 
0.04 
0.00 
0.00 
0.05 
0.13 
0.06 
0.00 


0.00 
0.00 
0.04 
0.31 
0.01 
0.00 
0.01 
0.05 
0.28 
0.02 


0.00 
0.00 
0.04 
0.25 
0.02 
0.00 
0.00 
0.06 
0.22 
0.04 


0.00 
0.00 
0.00 
0.02 
0.02 
0.00 
0.00 
0.00 
0.02 
0.02 


0.00 
0.00 
0.00 
0.03 
0.08 
0.00 
0.00 
0.00 
0.03 
0.08 


Category 
Total 


0.20 
0.19 
0.21 
0.37 
0.03 
0.21 
0.18 
0.20 
0.36 
0.04 


Category 
Total 


0.14 
0.19 
0.24 
0.33 
0.10 
0.15 
0.19 
0.23 
0.32 
0.12 


New Meridian 
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Table A.13.20 Reliability of Classification: Grade 5 ELA/L 


Full 
Summative Level1 Level 2 Level 3 Level 4 Level 5 Category 
Total 
Scale Score 


650-699 0.11 0.02 0.00 0.00 0.00 0.13 
700-724 0.03 0.13 0.04 0.00 0.00 0.19 


Decision 725-749 0.00 004 017 40.05 0.00 0.25 
Accuracy 
750-809 0.00 0.00 0.04 0.32 0.02 0.39 
810-850 0.00 0.00 0.00 0.01 0.03 0.04 
650-699 0.141 003 0.00 0.00 0.00 0.14 
700-724 0.03 010 005 0.00 0.00 0.19 
Decision 


725-749 0.00 0.05 0.13 0.06 0.00 0.24 
750-809 0.00 0.00 0.06 0.30 0.02 0.38 
810-850 0.00 0.00 0.00 0.02 0.03 0.05 


Consistency 


Table A.13.21 Reliability of Classification: Grade 6 ELA/L 
Full 


Summative Level1 Level 2 Level3 Level4 Level 5 Category 
Total 
Scale Score 
650-699 0.10 0.01 0.00 0.00 0.00 0.11 
_ 700-724 0.03 0.14 0.03 0.00 0.00 0.20 
Decision 725-749 0.00 0.04 0.20 0.04 0.00 0.28 
Accuracy 
750-809 0.00 0.00 0.04 0.29 0.02 0.35 
810-850 0.00 0.00 0.00 0.01 0.05 0.06 
650-699 0.09 0.03 0.00 0.00 0.00 0.12 
_ 700-724 0.03 0.12 0.05 0.00 0.00 0.20 
Decision 725-749 0.00 0.04 017 0.06 0.00 0.27 
Consistency 
750-809 0.00 0.00 0.06 0.26 0.02 0.34 
810-850 0.00 0.00 0.00 0.02 0.05 0.07 
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Table A.13.22 Reliability of Classification: Grade 7 ELA/L 


Full 
Summative  Level1 Level 2 Level 3 Level 4 Level 5 Category 
Total 
Scale Score 
650-699 0.11 0.02 0.00 0.00 0.00 0.13 
_ 700-724 0.03 0.11 0.03 0.00 0.00 0.17 
Decision 725-749 0.00 0.04 015 0.04 0.00 0.23 
Accuracy 
750-809 0.00 0.00 0.04 0.24 0.03 0.32 
810-850 0.00 0.00 0.00 0.03 0.13 0.16 
650-699 0.11 0.03 0.00 0.00 0.00 0.14 
_ 700-724 0.03 0.09 0.05 0.00 0.00 0.17 
erent 725-749 0.00 0.04 012 0.06 0.00 0.22 
Consistency 
750-809 0.00 0.00 0.05 0.21 0.04 0.30 
810-850 0.00 0.00 0.00 0.05 0.12 0.17 
Table A.13.23 Reliability of Classification: Grade 8 ELA/L 
Full Cat 
Summative Level1 Level 2 Level3 Level 4 Level 5 ony 
Total 
Scale Score 
650-699 0.14 0.02 0.00 0.00 0.00 0.16 
- 700-724 0.03 0.11 0.03 0.00 0.00 0.17 
Deeleieh 725-749 0.00 0.03 015 0.04 0.00 0.22 
Accuracy 
750-809 0.00 0.00 0.04 0.29 0.02 0.35 
810-850 0.00 0.00 0.00 0.02 0.09 0.11 
650-699 0.14 0.03 0.00 0.00 0.00 0.17 
_ 700-724 0.03 0.09 0.05 0.00 0.00 0.17 
peelsion 725-749 0.00 0.04 042 0.05 0.00 0.21 
Consistency 
750-809 0.00 0.00 0.05 0.26 0.03 0.34 
810-850 0.00 0.00 0.00 0.03 0.08 0.12 
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Table A.13.24 Reliability of Classification: Grade 9 ELA/L 


Full 
Summative 
Scale Score 


650-699 
700-724 
725-749 
750-809 
810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Accuracy 


Decision 
Consistency 


Table A.13.25 Reliability of Classification: Grade 10 ELA/L 


Full 
Summative 
Scale Score 


650-699 
700-724 


Decision 725-749 


Accuracy 
750-809 


810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Consistency 


Level1 Level2 Level3 Level4 Level 5 


0.20 
0.04 
0.00 
0.00 
0.00 
0.19 
0.04 
0.00 
0.00 
0.00 


Level 1 


0.15 
0.03 
0.00 
0.00 
0.00 
0.14 
0.03 
0.00 
0.00 
0.00 


0.03 
0.14 
0.04 
0.00 
0.00 
0.04 
0.11 
0.05 
0.00 
0.00 


Level2 Level3 Level4 Level 5 


0.02 
0.08 
0.03 
0.00 
0.00 
0.03 
0.07 
0.03 
0.00 
0.00 


0.00 
0.04 
0.14 
0.04 
0.00 
0.00 
0.05 
0.11 
0.05 
0.00 


0.00 
0.03 
0.11 
0.04 
0.00 
0.00 
0.04 
0.08 
0.05 
0.00 


0.00 
0.00 
0.04 
0.19 
0.01 
0.00 
0.00 
0.05 
0.17 
0.02 


0.00 
0.00 
0.04 
0.26 
0.03 
0.00 
0.01 
0.05 
0.22 
0.05 


0.00 
0.00 
0.00 
0.02 
0.07 
0.00 
0.00 
0.00 
0.03 
0.07 


0.00 
0.00 
0.00 
0.03 
0.16 
0.00 
0.00 
0.00 
0.04 
0.15 
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Category 
Total 


0.23 
0.22 
0.22 
0.25 
0.08 
0.23 
0.21 
0.21 
0.25 
0.09 


Category 
Total 
0.17 
0.14 
0.18 
0.33 
0.19 
0.17 
0.14 
0.17 
0.31 
0.20 
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Page 328 


2019 Technical Report 


Table A.13.26 Reliability of Classification: Grade 3 Mathematics 


Full 
Summative Level1 Level2 Level3 Level4 Level 5 Category 
Total 
Scale Score 
650-699 0.12 0.02 0.00 0.00 0.00 0.14 
a6 700-724 0.02 0.12 0.03 0.00 0.00 0.18 
Dereon 725-749 0.00 0.03 018 0.03 0.00 0.25 
Accuracy 
750-809 0.00 0.00 0.04 0.28 0.02 0.34 
810-850 0.00 0.00 0.00 0.02 0.08 0.10 
650-699 0.12 0.03 0.00 0.00 0.00 0.15 
_ 700-724 0.03 0.11 0.05 0.00 0.00 0.18 
Decon 725-749 0.00 0.04 015 0.05 0.00 0.24 
Consistency 
750-809 0.00 0.00 0.05 0.25 0.03 0.33 
810-850 0.00 0.00 0.00 0.03 0.07 0.10 
Table A.13.27 Reliability of Classification: Grade 4 Mathematics 
Full Cat 
Summative Level1 Level 2 Level3 Level 4 Level 5 eae 
Total 
Scale Score 


650-699 0.12 0.02 0.00 0.00 0.00 0.14 
700-724 0.02 0.15 0.03 0.00 0.00 0.20 


Decision 725-749 0.00 0.03 0.19 0.03 0.00 0.26 
Accuracy 
750-809 0.00 0.00 0.03 0.30 0.01 0.35 
810-850 0.00 0.00 0.00 0.01 0.04 0.05 
650-699 0.142 0.03 0.00 0.00 0.00 0.14 
700-724 0.03 043 0.05 0.00 0.00 0.20 
Decision 


725-749 0.00 0.04 0.16 0.05 0.00 0.25 
750-809 0.00 0.00 0.05 0.28 0.02 0.35 
810-850 0.00 0.00 0.00 0.02 0.04 0.06 


Consistency 
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Table A.13.28 Reliability of Classification: Grade 5 Mathematics 


Full 
Summative Level1 Level 2 Level3 Level 4 Level 5 Category 
Total 
Scale Score 


650-699 0.10 0.02 0.00 0.00 0.00 0.12 
700-724 0.03 0.19 0.03 0.00 0.00 0.25 


Decision 725-749 0.00 0.04 019 0.04 0.00 0.26 
Accuracy 
750-809 0.00 0.00 0.03 0.24 0.02 0.29 
810-850 0.00 0.00 0.00 0.01 0.06 0.08 
650-699 0.40 0.04 000 0.00 0.00 0.13 
700-724 0.03 0.46 0.05 0.00 0.00 0.24 
Decision 


725-749 0.00 0.05 0.16 0.05 0.00 0.26 
750-809 0.00 0.00 0.05 0.22 0.02 0.29 
810-850 0.00 0.00 0.00 0.02 0.06 0.08 


Consistency 


Table A.13.29 Reliability of Classification: Grade 6 Mathematics 


Full 
Summative Level1 Level 2 Level3 Level 4 Level 5 Category 
Total 
Scale Score 


650-699 0.15 0.02 0.00 0.00 0.00 0.17 
700-724 0.02 0.20 0.03 0.00 0.00 0.26 


Decision 725-749 0.00 0.04 0.21 0.03 0.00 0.28 
Accuracy 
750-809 0.00 0.00 0.03 0.22 #0.01 0.26 
810-850 0.00 0.00 0.00 0.01 0.03 0.04 
650-699 0.14 0.03 000 0.00 0.00 0.18 
700-724 0.03 0148 005 0.00 # 0.00 0.26 
Decision 


725-749 0.00 0.05 0.18 0.04 0.00 0.27 
750-809 0.00 0.00 0.04 0.20 0.01 0.26 
810-850 0.00 0.00 0.00 0.01 0.03 0.04 


Consistency 
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Table A.13.30 Reliability of Classification: Grade 7 Mathematics 


Full 


Summative Level1 Level 2 Level 3 Level 4 Level 5 


Scale Score 
650-699 
700-724 
725-749 
750-809 
810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Accuracy 


Decision 
Consistency 


Table A.13.31 Reliability of Classification: Grade 8 Mathematics 


Full 


Summative Level1 Level 2 Level 3 Level 4 Level 5 


Scale Score 
650-699 
700-724 
725-749 
750-809 
810-850 
650-699 
700-724 
725-749 
750-809 
810-850 


Decision 
Accuracy 


Decision 
Consistency 


New Meridian 


0.09 
0.03 
0.00 
0.00 
0.00 
0.08 
0.03 
0.00 
0.00 
0.00 


0.37 
0.06 
0.00 
0.00 
0.00 
0.35 
0.07 
0.01 
0.00 
0.00 


0.02 
0.25 
0.04 
0.00 
0.00 
0.04 
0.22 
0.06 
0.00 
0.00 


0.04 
0.16 
0.04 
0.00 
0.00 
0.06 
0.13 
0.05 
0.01 
0.00 


0.00 
0.04 
0.23 
0.03 
0.00 
0.00 
0.06 
0.20 
0.05 
0.00 


0.00 
0.05 
0.12 
0.02 
0.00 
0.01 
0.06 
0.10 
0.04 
0.00 


0.00 
0.00 
0.04 
0.19 
0.01 
0.00 
0.00 
0.05 
0.17 
0.01 


0.00 
0.00 
0.03 
0.09 
0.00 
0.00 
0.01 
0.04 
0.08 
0.00 


February 28, 2020 


0.00 
0.00 
0.00 
0.01 
0.03 
0.00 
0.00 
0.00 
0.01 
0.03 


0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
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Category 
Total 


0.11 
0.32 
0.32 
0.23 
0.03 
0.12 
0.31 
0.31 
0.23 
0.04 


Category 
Total 
0.41 
0.27 
0.20 
0.12 
0.00 
0.42 
0.26 
0.19 
0.13 
0.00 
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Table A.13.32 Reliability of Classification: Algebra I 


Full 
Summative Level1 Level 2 Level3 Level4 Level 5 Category 
Total 
Scale Score 
650-699 0.11 0.03 0.00 0.00 0.00 0.14 
_ 700-724 0.03 0.22 0.04 0.00 0.00 0.29 
pee) 725-749 0.00 0.05 0417 0.04 0.00 0.26 
Accuracy 
750-809 0.00 0.00 0.03 0.25 0.01 0.29 
810-850 0.00 0.00 0.00 0.00 0.02 0.03 
650-699 0.10 0.05 0.00 0.00 0.00 0.15 
_ 700-724 0.04 0.18 0.05 0.00 0.00 0.28 
Desieian 725-749 0.00 0.06 014 0.05 0.00 0.25 
Consistency 
750-809 0.00 0.00 0.05 0.23 0.01 0.29 
810-850 0.00 0.00 0.00 0.01 0.02 0.03 
Table A.13.33 Reliability of Classification: Geometry 
Full Cat 
Summative Level1 Level2 Level3 Level4 Level 5 aimed 
Total 
Scale Score 


650-699 0.06 0.01 0.00 0.00 0.00 0.07 
700-724 0.01 0.15 0.02 0.00 0.00 0.19 


Decision 725-749 0.00 0.03 0.19 0.04 0.00 0.26 
Accuracy 
750-809 0.00 0.00 003 £0.31 & 0.02 0.36 
810-850 0.00 0.00 000 0.02 & 0.10 0.12 
650-699 0.05 0.02 000 0.00 0.00 0.07 
700-724 0.01 014 003 £0.00 0.00 0.19 
Decision 


725-749 0.00 0.04 0.17 0.05 0.00 0.26 
750-809 0.00 0.00 0.04 0.28 0.03 0.35 
810-850 0.00 0.00 0.00 0.03 0.09 0.13 


Consistency 
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Table A.13.34 Reliability of Classification: Algebra II 


Full 
Summative Level1 Level2 Level3 Level4 Level5 Category 
Total 
Scale Score 


650-699 0.10 0.02 0.00 0.00 0.00 0.12 
700-724 0.02 0.12 0.03 0.00 0.00 0.17 


Decision 725-749 0.00 0.04 0413 0.04 #000 £0.20 
Accuracy 
750-809 0.00 0.00 003 0.38 #002 0.43 
810-850 0.00 0.00 0.00 0.01 0.06 £0.07 
650-699 0.10 0.03 000 0.00 000 013 
700-724 0.03 0.10 004 0.00 000 #017 
Decision 


725-749 0.00 0.04 0.10 0.05 0.00 0.20 
750-809 0.00 0.00 0.05 0.35 0.02 0.42 
810-850 0.00 0.00 0.00 0.02 0.06 0.08 


Consistency 


Table A.13.35 Reliability of Classification: Integrated Mathematics II 


Full 
Summative Level1 Level2 Level3 Level 4 Level 5 Category 
Total 
Scale Score 


650-699 0.09 0.02 0.00 0.00 0.00 0.11 
700-724 0.04 0.15 0.04 0.00 0.00 0.23 


Decision 725-749 0.00 004 020 0.05 # £0.00 0.28 
Accuracy 
750-809 0.00 000 004 024 002 0.30 
810-850 0.00 0.00 000 002 £0.06 0.08 
650-699 0.09 003 000 0.00 °# 0.00 0.12 
700-724 0.04 013 0.06 0.00 °# 0.00 0.23 
Decision 


725-749 0.00 0.04 0.16 0.06 0.00 0.27 
750-809 0.00 0.00 0.06 0.21 0.02 0.29 
810-850 0.00 0.00 0.00 0.03 0.06 0.09 


Consistency 
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Appendix 15: Growth 

Appendix 15 provides the summary growth results for subgroups for grade 4 — 11 ELA/L and 
mathematics 4 — 8 and high school. Grade 9 ELA, Algebra II, Integrated mathematics |, II, and III do not 
have sufficient sample sizes for subgroup summary analysis. 


Table A.15.1 Summary of SGP Estimates for Subgroups: Grade 4 ELA/L 


Total Sample Average 


Size Average SGP Sisadard Ene Median SGP 
Gender 
Male 2,974 52.06 12.90 53 
Female 2,826 56.25 12.65 59 
Ethnicity 
White 720 67.94 11.71 74 
African American 3,853 49.32 13.10 49 
Asian/Pacific Islander 86 67.02 11.63 74 
American Indian/Alaska Native -- -- -- -- 
Hispanic 994 59.56 12.58 63 
Multiple -- -- -- -- 
Special Instruction Needs 
Economically Disadvantaged 4,470 51.21 13.00 52 
Not-economically Disadvantaged 1,330 63.81 12.05 69 
English Learner 760 57.89 12.70 60 
Non-English Learner 5,040 53.53 12.79 54 
Students with Disabilities 1,241 45.72 13.63 43 
Students without Disabilities 4,559 56.38 12.55 58 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Total Sample 


Average 


Bice Average SGP SGindard Enor Median SGP 

Gender 

Male 2,790 49.58 13.49 49 

Female 2,833 54.39 13.36 56 
Ethnicity 

White 559 59.38 13.33 63 

African American 3,830 49.17 13.59 49 

Asian/Pacific Islander 76 67.17 12.21 76.5 

American Indian/Alaska Native -- -- -- == 

Hispanic 1,015 56.58 13.03 59 

Multiple -- -- = = 
Special Instruction Needs 

Economically Disadvantaged 4,431 50.28 13.54 51 

Not-economically Disadvantaged 1,192 58.41 12.97 63 

English Learner 137 55.20 13.08 57 

Non-English Learner 4,886 51.52 13.47 53 

Students with Disabilities 1,274 45.13 14.32 42 

Students without Disabilities 4,349 54.02 13.16 56 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 

Table A.15.3 Summary of SGP Estimates for Subgroups: Grade 6 ELA/L 

oe Average SGP Beis ee Median SGP 

Gender 

Male 2,585 50.05 13.46 50 

Female 2,639 56.81 13.12 60 
Ethnicity 

White 508 63.88 13.05 68 

African American 3,614 51.27 13.47 52 

Asian/Pacific Islander 67 66.79 13.19 68 

American Indian/Alaska Native -- -- -- = 

Hispanic 907 53.96 12.85 55 

Multiple -- aa a = 
Special Instruction Needs 

Economically Disadvantaged 4,109 51.78 13.35 52 

Not-economically Disadvantaged 1,115 59.68 13.07 64 

English Learner 484 53.56 12.99 53.5 

Non-English Learner 4,740 53.45 13.32 55 

Students with Disabilities 1,150 47.41 14.18 46 

Students without Disabilities 4,074 55.17 13.04 58 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Total Sample 


Average 


Bice Average SGP Séindard Enor Median SGP 

Gender 

Male 2,358 55.18 12.90 57 

Female 2,241 60.44 12.58 64 
Ethnicity 

White 416 63.80 13.69 66 

African American 3,179 56.80 12.69 59 

Asian/Pacific Islander 69 66.62 12.51 72 

American Indian/Alaska Native -- -- -- == 

Hispanic 828 57.43 12.33 61 

Multiple -- -- = = 
Special Instruction Needs 

Economically Disadvantaged 3,549 56.56 12.59 59 

Not-economically Disadvantaged 1,050 61.72 13.28 65 

English Learner 284 58.59 12.25 60 

Non-English Learner 4,315 57.69 12.78 61 

Students with Disabilities 1,047 52.02 13.23 52 

Students without Disabilities 3,552 59.43 12.60 63 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 

Table A.15.5 Summary of SGP Estimates for Subgroups: Grade 8 ELA/L 

oer Average SGP Reis ee Median SGP 

Gender 

Male 2,162 49.60 13.64 50 

Female 2,171 53.67 13.57 55 
Ethnicity 

White 430 57.91 14.29 61 

African American 3,026 49.99 13.52 50 

Asian/Pacific Islander 61 64.11 14.35 69 

American Indian/Alaska Native -- -- -- = 

Hispanic 728 53.50 13.30 55 

Multiple -- = am = 
Special Instruction Needs 

Economically Disadvantaged 3,295 50.36 13.48 51 

Not-economically Disadvantaged 1,038 55.70 14.00 59 

English Learner 265 57.23 13.34 59 

Non-English Learner 4,068 51.27 13.62 52 

Students with Disabilities 975 46.22 14.31 44 

Students without Disabilities 3,358 53.21 13.40 55 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Total Sample 


Size Average SGP 
Gender 
Male 1,548 43.45 
Female 1,649 42.99 
Ethnicity 
White 295 52.44 
African American 2,244 41.27 
Asian/Pacific Islander 66 54.14 
American Indian/Alaska Native 13 32.69 
Hispanic 537 44.32 
Multiple -- -- 
Special Instruction Needs 
Economically Disadvantaged 2,458 41.77 
Not-economically Disadvantaged 739 47.99 
English Learner 210 47.77 
Non-English Learner 2,987 42.89 
Students with Disabilities 690 41.87 
Students without Disabilities 2,507 43.58 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Average 
Standard Error 


10.38 
10.58 


11.35 
10.32 
11.92 
10.88 
10.43 


Median SGP 


40 
40 


54 
38 
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Table A.15.7 Summary of SGP Estimates for Subgroups: Grade 4 Mathematics 


Average 


Total Sample Average SGP Median SGP 


Size Standard Error 
Gender 
Male 2,952 50.92 12.70 50 
Female 2,799 51.32 12.58 52 
Ethnicity 
White 719 61.02 13.18 65 
African American 3,844 47.99 12.53 47 
Asian/Pacific Islander 86 64.55 13.00 75.5 
American Indian/Alaska Native -- -- -- -- 
Hispanic 955 54.09 12.57 56 
Multiple -- -- = == 
Special Instruction Needs 
Economically Disadvantaged 4,424 49.54 12.55 49 
Not-economically Disadvantaged 1,327 56.38 12.97 61 
English Learner 722 54.87 12.62 56.5 
Non-English Learner 5,029 50.58 12.65 50 
Students with Disabilities 1,231 46.48 13.28 46 
Students without Disabilities 4,520 52.38 12.47 53 
Spanish Language Form 40 41.58 12.47 40.5 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Table A.15.8 Summary of SGP Estimates for Subgroups: Grade 5 Mathematics 


Total Sample Average 


Size Average SGP BtandardiEncr Median SGP 
Gender 
Male 2,774 50.30 13.57 50 
Female 2,814 51.86 13.33 52 
Ethnicity 
White 559 54.82 13.46 57 
African American 3,830 47.70 13.62 46 
Asian/Pacific Islander 76 66.38 12.83 70.5 
American Indian/Alaska Native 10 43.80 11.27 38.5 
Hispanic 980 60.45 12.89 65 
Multiple -- -- = = 
Special Instruction Needs 
Economically Disadvantaged 4,398 50.12 13.54 49 
Not-economically Disadvantaged 1,190 54.66 13.11 56 
English Learner 703 61.56 13.21 65 
Non-English Learner 4,885 49.58 13.48 49 
Students with Disabilities 1,259 49.66 14.90 49 
Students without Disabilities 4,329 51.50 13.03 52 
Spanish Language Form 35 75.20 12.65 84 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Table A.15.9 Summary of SGP Estimates for Subgroups: Grade 6 Mathematics 


Total Sample Average 


Size Average SGP BtandardiEncr Median SGP 
Gender 
Male 2,575 42.87 14.58 40 
Female 2,628 47.39 14.39 46 
Ethnicity 
White 503 61.10 12.51 67 
African American 3,617 41.98 14.91 39 
Asian/Pacific Islander 67 54.27 12.83 59 
American Indian/Alaska Native -- -- -- -- 
Hispanic 889 47.62 14.26 46 
Multiple -- -- = = 
Special Instruction Needs 
Economically Disadvantaged 4,096 42.73 14.86 40 
Not-economically Disadvantaged 1,107 54.13 13.10 57 
English Learner 467 46.33 15.07 45 
Non-English Learner 4,736 45.04 14.43 43 
Students with Disabilities 1,151 43.79 15.77 40 
Students without Disabilities 4,052 45.54 14.12 44 
Spanish Language Form 18 50.44 15.64 47.5 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 


New Meridian February 28, 2020 Page 340 


2019 Technical Report 


Table A.15.10 Summary of SGP Estimates for Subgroups: Grade 7 Mathematics 


Total Sample Average 


Size Average SGP Sis ndard Enron Median SGP 
Gender 
Male 2,279 50.07 15.36 49 
Female 2,191 52.27 15.16 53 
Ethnicity 
White 372 60.35 13.79 65.5 
African American 3,145 49.37 15.67 48 
Asian/Pacific Islander 62 62.81 13.82 66.5 
American Indian/Alaska Native -- -- -- == 
Hispanic 794 51.80 14.62 54 
Multiple -- -- = = 
Special Instruction Needs 
Economically Disadvantaged 3,505 49.51 15.58 49 
Not-economically Disadvantaged 965 57.10 14.13 61 
English Learner 257 51.46 15.66 51 
Non-English Learner 4,213 51.13 15.24 51 
Students with Disabilities 1,035 46.68 16.59 45 
Students without Disabilities 3,435 52.49 14.86 53 
Spanish Language Form 29 50.45 17.36 50 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Table A.15.11 Summary of SGP Estimates for Subgroups: Grade 8 Mathematics 


Total Sample Average 


Size Average SGP Sis ndard Eivon Median SGP 
Gender 
Male 1,758 45.39 16.73 44 
Female 1,688 48.47 16.20 47 
Ethnicity 
White 173 46.18 14.55 43 
African American 2,672 46.75 16.60 46 
Asian/Pacific Islander 35 52.20 15.26 49 
American Indian/Alaska Native -- -- -- = 
Hispanic 519 47.46 16.50 44 
Multiple -- -- = = 
Special Instruction Needs 
Economically Disadvantaged 2,864 47.19 16.67 46 
Not-economically Disadvantaged 582 45.43 15.47 44 
English Learner 204 50.37 17.88 51 
Non-English Learner 3,242 46.67 16.38 45 
Students with Disabilities 899 44.52 17.66 43 
Students without Disabilities 2,547 47.73 16.05 46 
Spanish Language Form 28 47.86 19.12 48 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Table A.15.12 Summary of SGP Estimates for Subgroups: Algebra I 


Average 


Total sample’ “Average SGP Median SGP 


Size Standard Error 
Gender 
Male 392 41.06 13.24 37 
Female 451 44.13 14.12 40 
Ethnicity 
White 245 48.06 12.16 49 
African American 344 37.63 15.20 34 
Asian/Pacific Islander 26 52.15 11.99 57.5 
Hispanic 183 43.28 13.47 43 
Multiple = = -- -- 
‘SpecialinstructionNeedS = 
Economically Disadvantaged 391 37.90 15.01 33 
pone ouonucany 452 46.86 12.59 45.5 
Disadvantaged 
English Learner 38 45.84 11.45 38.5 
Non-English Learner 805 42.56 13.82 39 
Students with Disabilities 68 51.54 14.41 57 
Students without Disabilities 775 41.93 13.65 38 
Spanish Language Form -- -- -- - 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 
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Total Sample 


Size Average SGP 
Gender 
Male 1,170 52.96 
Female 1,202 54.43 
Ethnicity 
White 101 57.15 
African American 1,849 52.74 
Asian/Pacific Islander 34 67.56 
American Indian/Alaska Native 10 57.20 
Hispanic 363 56.07 
Multiple -- -- 
Special Instruction Needs 
Economically Disadvantaged 2,037 53.18 
Not-economically 
Disadvantaged me mene 
English Learner 192 55.64 
Non-English Learner 2,180 53.54 
Students with Disabilities 564 48.75 
Students without Disabilities 1,808 55.25 
Spanish Language Form 13 54.46 
Note: “--” indicates insufficient sample for SGP calculation for these tests. 


February 28, 2020 


Average 
Standard Error 


15.67 
14.63 


13.25 
15.32 
11.96 
14.63 
15.13 


Median SGP 


53 
54 
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Addendum 


The addendum presents the results of analyses for the fall/winter block 2018 operational administration. These 
results are reported separately from the spring 2019 results since fall testing included additional states or agencies 
and consisted of a nonrepresentative subset of students testing only ELA/L grades 9, 10, and 11, as well as Algebra 
|, Geometry, and Algebra II. Both online and paper test forms were administered for each test. 


To organize the addendum, tables are numbered sequentially according to the section represented by the tables. 
The reader can refer back to the corresponding section in the technical report for related information on the topic. 
For example, the first addendum table provides participation counts similar to those provided for Section 11; 
therefore it is numbered ADD.11.1. The second addendum table for Section 11 is numbered ADD.11.2, and so on. 
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Table ADD.11.1 State Participation in ELA/L Fall 2018 Operational Tests, by Grade 


English Language Arts-Literacy 


State Category Total Grade 9 Grade 10 Grade 11 
N of Students 43,970 4,651 26,181 13,138 

N of CBT 43,806 4,643 26,130 13,033 

All States % of CBT 100 100 100 99 
N of PBT 164 8 51 105 

% of PBT 0 0 0 1 

% of All Data 0 n/a n/a 0 

N of Students 65 n/a n/a 65 

BIE N of CBT 14 n/a n/a 14 
% of CBT 22 n/a n/a 22 

N of PBT 51 n/a n/a 51 

% of PBT 79 n/a n/a 79 

% of All Data 51 n/a 46 4 

N of Students 22,212 n/a 20,318 1,894 

MD N of CBT 22,168 n/a 20,275 1,893 
% of CBT 100 n/a 100 100 

N of PBT 44 n/a 43 1 

% of PBT 0 n/a 0 0 

% of All Data 32 10 13 9 

N of Students 13,979 4,264 5,628 4,087 

NJ N of CBT 13,952 4,256 5,620 4,076 
% of CBT 100 100 100 100 

N of PBT 27 8 8 11 

% of PBT 0 0 0 0 

% of All Data 18 1 1 16 

N of Students 7,714 387 235 7,092 

NM N of CBT 7,672 387 235 7,050 
% of CBT 100 100 100 99 

N of PBT 42 0 0 42 

% of PBT 1 0 0 1 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and 
NM=New Mexico; CBT=computer-based test; PBT=paper-based test; n/a=not 
applicable. 
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Table ADD.11.2 State Participation in Mathematics Fall 2018 Operational Tests, by Course 


Mathematics 
State 


All States 


BIE 


MD 


NJ 


NM 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and NU=New 
Mexico; CBT=computer-based test; PBT=paper-based test; n/a=not applicable. 


Category 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 


Total 
48,917 
48,671 

100 
246 


20,342 
20,284 
100 

58 

0 

44 
21,594 
21,566 
100 

28 


A1=Algebra |, GO=Geometry, A2=Algebra Il. 
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Al 
32,649 
32,576 

100 

73 


0 


1,760 
1,741 
99 

19 

1 
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Mathematics 
State 


All States 


BIE 


MD 


NJ 


NM 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and NU=New 
Mexico; CBT=computer-based test; PBT = paper-based test; n/a=not applicable. 


Category 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 
% of All Data 
N of Students 
N of CBT 
% of CBT 
N of PBT 
% of PBT 


Total 
531 


330 
330 
100 
n/a 
n/a 
14 
73 
73 
100 
n/a 
n/a 


A1=Algebra |, GO=Geometry, A2=Algebra Il. 
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n/a 


n/a 


n/a 
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Grade Mode 


All 

9 CBT 
PBT 
All 

10 CBT 
PBT 
All 

11 CBT 
PBT 


Valid 
Cases 
4,651 
4,643 
8 
26,181 
26,130 
51 
13,138 
13,033 
105 


Female 
N 


2,352 
2,349 
n/r 
11,459 
11,443 
n/r 
5,591 
5,544 
47 


% 


50.6 
50.6 
n/r 

43.8 
43.8 
n/r 

42.6 
42.5 
44.8 


Male 
N 


2,299 
2,294 
n/r 
14,722 
14,687 
35 
7,547 
7,489 
58 


% 


49.4 
49.4 
n/r 

56.2 
56.2 
68.6 
57.4 
57.5 
55.2 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and NU=New 


Mexico; 


CBT=computer-based test; PBT=paper-based test; 


Table ADD.11.5 All States Combined: Fall 2018 Mathematics Students by Course and Gender 


Course Mode 


All 
Al CBT 
PBT 
All 
A2 CBT 
PBT 
All 
GO CBT 
PBT 


Valid 
Cases 
32,649 
32,576 
73 
10,312 
10,179 
133 
5,956 
5,916 
40 


Female 


15,886 
15,856 
30 
5,227 
5,159 
68 
2,918 
2,898 
20 


% 


48.7 
48.7 
41.1 
50.7 
50.7 
51.1 
49.0 
49.0 
50.0 


Male 
N 


16,763 
16,720 


43 
5,085 
5,020 
65 
3,038 
3,018 
20 


% 


51.3 
51.3 
58.9 
49.3 
49.3 
48.9 
51.0 
51.0 
50.0 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and NM=New 


Mexico; 


CBT=computer-based test; PBT=paper-based test; 
A1=Algebra |, GO=Geometry, A2=Algebra Il. 
n/a=not applicable. and n/r=not reported due to n<20. 
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Table ADD.11.6 All States Combined: Fall 2018 Spanish-Language Mathematics Students by Course and 
Gender 


Female Male 
Course Mode Valid Cases N % N % 
All 327 172 52.6 155 47.4 
Al CBT 308 165 53.6 143 46.4 
PBT 19 n/r n/r n/r n/r 
AQ All 108 50 46.3 58 53.7 
CBT 108 50 46.3 58 53.7 
All 96 44 45.8 52 54.2 
Go CBT 96 44 45.8 52 54.2 


Note: BIE=Bureau of Indian Education, MD=Maryland, NJ=New Jersey, and NM=New Mexico; 
CBT=computer-based test; PBT=paper-based test; 

A1=Algebra |, GO=Geometry, A2=Algebra Il. 

n/a=not applicable. and n/r=not reported due to n<20. 
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Table ADD.11.7 Demographic Information for Fall 2018 Grade 9 ELA/L, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 33.0 n/a n/a 33.2 31.8 
Student with Disabilities 18.6 n/a n/a 19.7 6.2 
English learner 1.4 n/a n/a 0.8 7.8 
Male 49.4 n/a n/a 49.7 46.0 
Female 50.6 n/a n/a 50.3 54.0 
American Indian/Alaska Native 1.2 n/a n/a n/r 12.1 
Asian 6.6 n/a n/a 7.0 n/r 
Black/African American 18.6 n/a n/a 20.2 n/r 
Hispanic/Latino 24.2 n/a n/a 21.4 55.3 
White/Caucasian 46.0 n/a n/a 48.0 24.0 
Native Hawaiian/Pacific 

n/r n/a n/a n/r n/a 
Islander 
Two or More Races Reported 2.8 n/a n/a 3.0 n/r 
Unknown n/r n/a n/a n/a n/r 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 
MD=Maryland, NJ=New Jersey, and NMU=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 


Table ADD.11.8 Demographic Information for Fall 2018 Grade 10 ELA/L, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 44.2 n/a 48.3 29.7 39.6 
Student with Disabilities 24.5 n/a 26.1 19.5 n/r 
English learner 13.8 n/a 17.0 2.4 8.5 
Male 56.2 n/a 58.0 49.9 50.2 
Female 43.8 n/a 42.0 50.1 49.8 
American Indian/Alaska Native 0.4 n/a 0.3 n/r 19.1 
Asian 3.8 n/a 2.5 8.3 n/r 
Black/African American 40.9 n/a 47.0 20.5 n/r 
Hispanic/Latino 22.3 n/a 22.5 20.5 47.7 
White/Caucasian 29.7 n/a 24.6 48.3 30.2 
Native Hawaiian/Pacific 01 aia sie ile sig 
Islander 

Two or More Races Reported 2.8 n/a 3.0 2.0 n/r 
Unknown n/a n/a n/a n/a n/a 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 
MD=Maryland, NJU=New Jersey, and NM=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 
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Table ADD.11.9 Demographic Information for Fall 2018 Grade 11 ELA/L, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 47.2 95.4 40.0 31.3 57.9 
Student with Disabilities 22.6 n/r 23.5 19.3 24.3 
English learner 12.8 73.8 5.0 1.3 21.0 
Male 57.4 60.0 62.5 51.9 59.2 
Female 42.6 40.0 37.5 48.1 40.8 
American Indian/Alaska Native 7.1 96.9 n/r n/r 12.1 
Asian 3.1 n/a 2.8 6.6 1.1 
Black/African American 14.9 n/a 43.5 23.0 2.6 
Hispanic/Latino 43.1 n/a 10.8 20.2 65.3 
White/Caucasian 29.0 n/a 39.0 47.7 15.8 
Native Hawaiian/Pacific 02 aia ai sie ale 
Islander 

Two or More Races Reported 1.9 n/r 3.7 2.2 1.2 
Unknown 0.9 n/r n/a n/a 1.6 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 


MD=Maryland, NJ=New Jersey, and NMU=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 


Table ADD.11.10 Demographic Information for Fall 2018 Algebra I, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 43.3 n/a 49.4 35.0 47.7 
Student with Disabilities 21.4 n/a 25.4 16.2 17.0 
English learner 11.3 n/a 15.2 6.0 11.7 
Male 51.3 n/a 53.0 49.2 48.1 
Female 48.7 n/a 47.0 50.8 51.9 
American Indian/Alaska Native 0.3 n/a 0.4 n/r 4.0 
Asian 4.3 n/a 2.5 6.8 n/r 
Black/African American 35.1 n/a 46.8 20.5 n/r 
Hispanic/Latino 24.7 n/a 21.6 27.5 62.8 
White/Caucasian 32.8 n/a 25.5 43.0 27.5 
Native Hawaiian/Pacific 02 a 01 02 ie 
Islander 

Two or More Races Reported 2.5 n/a 3.0 1.8 n/a 
Unknown 0.1 n/a n/a n/r 4.2 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 


MD=Maryland, NJ=New Jersey, and NM=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 


New Meridian 


February 28, 2020 


Page 352 


2019 Technical Report 


Table ADD.11.11 Demographic Information for Fall 2018 Geometry, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 39.5 n/r 56.6 33.0 53.6 
Student with Disabilities 18.4 n/r n/r 18.2 19.0 
English learner 8.0 n/r n/r 3.4 18.0 
Male 51.0 n/r 66.0 51.2 49.9 
Female 49.0 n/r n/r 48.8 50.1 
American Indian/Alaska Native 4.0 n/r n/a n/r 12.3 
Asian 5.1 n/a n/r 7.0 n/r 
Black/African American 14.7 n/a n/r 19.9 2.8 
Hispanic/Latino 35.9 n/a n/r 24.0 64.9 
White/Caucasian 37.5 n/a 75.5 46.7 15.2 
Native Hawaiian/Pacific 

n/r n/a n/a n/r n/r 
Islander 
Two or More Races Reported 1.7 n/a n/r 2.1 n/r 
Unknown 0.8 n/a n/a n/r 2.8 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 
MD=Maryland, NJ=New Jersey, and NM=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 


Table ADD.11.12 Demographic Information for Fall 2018 Algebra II, Overall and by State 


Demographic All States BIE MD NJ NM 
Economically Disadvantaged 41.5 87.0 31.2 28.6 55.1 
Student with Disabilities 13.5 n/r 10.3 13.9 14.7 
English learner 10.5 69.0 2.0 3.6 18.3 
Male 49.3 43.0 51.0 48.6 49.4 
Female 50.7 57.0 49.0 51.4 50.6 
American Indian/Alaska Native 8.1 97.0 n/r n/r 15.7 
Asian 5.0 n/a 4.4 10.4 0.9 
Black/African American 11.6 n/a 21.1 18.4 2.5 
Hispanic/Latino 36.4 n/a 9.3 21.0 60.7 
White/Caucasian 35.5 n/a 59.5 48.2 16.3 
Native Hawaiian/Pacific 02 ae ale ie fie 
Islander 

Two or More Races Reported 2.4 n/r 5.5 1.6 1.8 
Unknown 0.8 n/r n/a n/a 1.8 


Note: All States = data from all participating states combined; BIE=Bureau of Indian Education, 
MD=Maryland, NJU=New Jersey, and NM=New Mexico 
n/a = not applicable; and n/r = not reported due to n<20. 
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Group Type Group N Mean SD Min Max 
Full Summative Score 4,651 744.52 35.20 650 850 
Female 2,352 752.50 33.58 653 850 
Gender 
Male 2,299 736.36 34.94 650 841 
American Indian/Alaska 
Native 56 751.79 31.41 661 808 
Asian 307 763.90 33.57 650 850 
Black/African American 865 731.91 32.70 650 850 
Ethnicity Hispanic/Latino 1,127 736.33 32.83 650 841 
Native Hawaiian/Pacific 
Islander n/r n/r n/r n/r n/r 
Two or more races 132 743.40 34.08 661 825 
White 2,140 750.73 34.97 650 850 
Not Economically 
. Disadvantaged 3,114 750.32 34.38 650 850 
Economic Status* 
Economically 
Disadvantaged 1,537 732.77 33.89 650 850 
Fhiglish Ladiner Status Non English Learner 4,588 745.00 35.00 650 850 
English Learner 63 709.32 31.55 650 799 
sia Students without Disabilities 3,785 750.87 32.82 650 850 
Pieduulive Students with Disabilities 866 716.76 31.61 650 850 
Reading Summative 
Score 4,651 48.17 14.39 10 90 
Gaier Female 2,352 50.12 13.95 12 90 
Male 2,299 46.16 14.57 10 90 
American Indian/Alaska 
Native 56 51.27 12.33 15 75 
Asian 307 55.12 13.61 10 90 
Black/African American 865 43.64 13.32 10 84 
Ethnicity Hispanic/Latino 1,127 44.72 13.59 10 87 
Native Hawaiian/Pacific 
Islander n/r n/r n/r n/r n/r 
Two or more races 132 47.49 13.74 15 79 
White 2,140 50.68 14.44 10 90 
Not Economically 
: Disadvantaged 3,114 50.42 14.13 10 90 
Economic Status* ; 
Economically 
Disadvantaged 1,537 43.60 13.83 10 87 
Enolich Leambr Status Non English Learner 4,588 48.36 14.32 10 90 
English Learner 63 34.43 13.03 10 81 
Disabilities Students without Disabilities 3,785 50.56 13.60 10 90 
Students with Disabilities 866 37.73 13.06 10 90 
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Group Type Group N Mean SD Min Max 
Writing Summative 
Score 4,651 31.72 11.27 10 60 
mana Female 2,352 35.01 9.80 10 60 
Male 2,299 28.36 11.68 10 60 
American Indian/Alaska 
Native 56 33.23 10.45 10 49 
Asian 307 37.43 9.48 10 60 
Black/African American 865 27.88 11.32 10 60 
Ethnicity Hispanic/Latino 1,127 29.72 11.14 10 53 
Native Hawaiian/Pacific 
Islander n/r n/r n/r n/r n/r 
Two or more races 132 31.94 11.14 10 60 
White 2,140 33.37 10.88 10 60 
Not Economically 
Disadvantaged 3,114 33.35 10.81 10 60 
Economic Status* - 
Economically 
Disadvantaged 1,537 28.43 11.46 10 60 
Non English Learner 4,588 31.86 11.20 10 60 
Sige m einer 21209 ce agishLeamer 63 21.59 11.39 10 re 
wee Students without Disabilities 3,785 33.70 10.25 10 60 
pieaeuues Students with Disabilities 866 23.07 11.45 10 53 


Note: This table is identical to Table 12.8 in Section 12. *Economic status was based on participation in 
National School Lunch Program (NSLP): receipt of free or reduced-price lunch (FRL). n/r = not reported 


due to n<20. 
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Group Type Group N Mean SD Min Max 
Full Summative Score 26,181 717.92 46.86 650 850 
Female 11,459 727.05 49.70 650 850 
nae Male 14,722 710.81 43.21 650 850 
American Indian/Alaska 
Native 108 720.44 34.22 650 799 
Asian 989 752.04 56.69 650 850 
Black/African American 10,715 702.16 34.52 650 850 
Ethnicity Hispanic/Latino 5,835 704.13 39.72 650 850 
Native Hawaiian/Pacific 
Islander 26 736.42 51.11 662 843 
Two or more races 724 726.30 46.46 650 850 
White 7,784 744.73 50.69 650 850 
Not Economically 
; Disadvantaged 14,609 730.29 50.40 650 850 
Economic Status* . 
Economically 
Disadvantaged 11,572 702.31 36.39 650 850 
English Learner Status Non English Learner 22,580 722.96 47.24 650 850 
English Learner 3,601 686.28 28.47 650 799 
Disabilities Students without Disabilities 19,761 724.08 47.77 650 850 
Students with Disabilities 6,419 698.95 38.10 650 850 
Reading Summative 
Score 26,181 37.83 18.76 10 90 
Gander Female 11,459 40.10 19.62 10 90 
Male 14,722 36.05 17.85 10 90 
American Indian/Alaska 
Native 108 38.25 14.36 10 74 
Asian 989 50.63 23.19 10 90 
Black/African American 10,715 32.30 14.34 10 90 
Ethnicity Hispanic/Latino 5,835 31.61 15.81 10 90 
Native Hawaiian/Pacific 
Islander 26 45.04 22.76 10 90 
Two or more races 724 41.52 18.59 10 90 
White 7,784 48.08 20.24 10 90 
Not Economically 
: Disadvantaged 14,609 42.61 20.09 10 90 
Economic Status* . 
Economically 
Disadvantaged 11,572 31.78 14.87 10 90 
English’ Leaner Status Non English Learner 22,580 39.99 18.83 10 90 
English Learner 3,601 24.25 11.03 10 79 
eee Students without Disabilities 19,761 40.02 19.08 10 90 
pare Students with Disabilities 6419 31.09 15.94 10 90 
Writing Summative 
Score 26,181 25.88 12.88 10 60 
Gender Female 11,459 29.07 13.26 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


724 
7,784 


14,609 


11,572 
22,580 
3,601 
19,761 
6,419 


Mean 
23.40 


27.08 
34.42 
21.63 
23.33 


30.77 
27.62 
32.37 


28.85 


22.13 
26.93 
19.31 
27.60 
20.59 


SD 
12.00 


10.46 
14.06 
10.47 
11.46 


12.24 
12.87 
13.59 


13.57 


10.84 
13.03 

9.54 
12.97 
11.03 


Min Max 
10 60 
10 48 
10 60 
10 60 
10 60 
10 51 
10 60 
10 60 
10 60 
10 60 
10 60 
10 47 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). 
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Group Type Group N Mean SD Min Max 
Full Summative Score 13,138 719.36 38.49 650 850 
Female 5,591 728.99 39.61 650 850 
Gender 
Male 7,547 712.22 36.02 650 850 
American Indian/Alaska 
Native 930 715.01 31.74 650 827 
Asian 403 746.89 47.49 650 850 
Black/African American 1,951 712.87 36.34 650 839 
Ethnicity Hispanic/Latino 5,657 710.00 33.02 650 842 
Native Hawaiian/Pacific 
Islander 25 731.12 42.37 653 808 
Two or more races 249 732.58 42.36 650 842 
White 3,810 733.80 41.43 650 850 
Not Economically 
; Disadvantaged 6,733 728.33 41.20 650 850 
Economic Status* . 
Economically 
Disadvantaged 6,207 709.69 32.93 650 842 
Enalishi Learner Status Non English Learner 11,454 722.61 39.01 650 850 
English Learner 1,684 697.24 25.40 650 798 
Disabilities Students without Disabilities 10,056 725.34 38.66 650 850 
Students with Disabilities 2,968 699.03 30.42 650 829 
Reading Summative 
Score 13,138 38.87 15.25 10 90 
Ganicr Female 5,591 41.83 15.65 10 90 
Male 7,547 36.68 14.56 10 90 
American Indian/Alaska 
Native 930 35.42 12.24 10 84 
Asian 403 48.87 18.17 10 90 
Black/African American 1,951 36.53 14.38 10 86 
Ethnicity Hispanic/Latino 5,657 35.27 13.08 10 89 
Native Hawaiian/Pacific 
Islander 25 44.64 17.59 13 79 
Two or more races 249 44.16 16.66 10 89 
White 3,810 44.83 16.53 10 90 
Not Economically 
: Disadvantaged 6,733 42.48 16.28 10 90 
Economic Status* . 
Economically 
Disadvantaged 6,207 34.96 13.04 10 89 
English: Leaner Statue Non English Learner 11,454 40.24 15.44 10 90 
English Learner 1,684 29.56 9.69 10 74 
esas Students without Disabilities 10,056 41.17 15.31 10 90 
pare Students with Disabilities 2,968 31.06 12.27 10 82 
Writing Summative 
Score 13,138 22.94 13.02 10 60 
Gender Female 5,591 26.73 13.18 10 60 
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Group Type 


Ethnicity 


Economic Status* 


English Learner Status 


Disabilities 


Group 

Male 

American Indian/Alaska 
Native 

Asian 

Black/African American 
Hispanic/Latino 

Native Hawaiian/Pacific 
Islander 

Two or more races 
White 

Not Economically 
Disadvantaged 
Economically 
Disadvantaged 

Non English Learner 
English Learner 


Students without Disabilities 


Students with Disabilities 


249 
3,810 


6,733 


6,207 
11,454 
1,684 
10,056 
2,968 


Mean 
20.13 


24.20 
31.17 
20.82 
20.22 


24.84 
26.31 
26.62 


25.33 


20.36 
23.70 
17.77 
24.64 
17.15 


SD 
12.16 


11.69 
14.75 
12.45 
11.79 


14.11 
14.03 
13.77 


13.67 


11.81 
13.22 
10.20 
13.22 
10.52 


Min Max 
10 60 
10 56 
10 60 
10 52 
10 60 
10 44 
10 56 
10 60 
10 60 
10 60 
10 60 
10 46 
10 60 
10 60 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): receipt 
of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Group Type Group N Mean SD Min Max 
Full Summative 
Score 32,649 725.43 30.32 650 850 
Female 15,886 727.24 30.36 650 850 
Gender 
Male 16,763 723.72 30.17 650 850 
American Indian/Alaska 
Native 108 718.12 28.37 650 793 
Asian 1,411 746.75 37.01 650 850 
Black/African American 11,459 714.02 24.16 650 850 
Ethnicity Hispanic/Latino 8,060 719.67 27.27 650 850 
Native Hawaiian/Pacific 
Islander 61 734.98 31.31 650 807 
Two or more races 805 726.8 31 650 841 
White 10,720 738.99 30.57 650 850 
Not Economically 
Disadvantaged 18,497 731.89 31.76 650 850 
Economic Status* 
Economically 
Disadvantaged 14,149 717 26 650 850 
English Learner Non English Learner 28,962 728.04 30.17 650 850 
Status English Learner 3,687 705 = _.22.73 650 844 
Disabilities Students without Disabilities 25,644 729.02 30.38 650 850 
Students with Disabilities 7,001 712.32 26.21 650 841 
Language Form Spanish 327 =703.13 20.3 650 754 


Note: This table is identical to Table 12.10 in Section 12. *Economic status was based on 
participation in National School Lunch Program (NSLP): receipt of free or reduced-price lunch 


(FRL). 


New Meridian 


February 28, 2020 


Page 360 


Table ADD.12.5 Subgroup Performance for Mathematics Scale Scores: Geometry 


2019 Technical Report 


Group Type Group N Mean SD Min Max 
Full Summative 
Score 10,312 713.12 39.07 650 850 
Female 5,227 714.46 37.73 650 850 
Gender 
Male 5,085 711.74 40.35 650 850 
American Indian/Alaska 
Native 832 695.45 23.59 650 809 
Asian 514 761.25 49.22 650 850 
Black/African American 1,196 701.48 30.06 650 850 
Ethnicity Hispanic/Latino 3,755 698.83 27.46 650 850 
Native Hawaiian/Pacific 
Islander n/r n/r n/r n/r n/r 
Two or more races 243 718 40.67 650 833 
White 3,663 728.72 41.55 650 850 
Not Economically 
Disadvantaged 5,808 723.97 42.99 650 850 
Economic Status* 
Economically 
Disadvantaged 4,276 699.26 27.81 650 850 
English Learner Non English Learner 9,227 716.04 39.58 650 850 
Status English Learner 1,083 688.21 22.22 650 829 
Disabilities Students without Disabilities 8,814 716.46 39.49 650 850 
Students with Disabilities 1,397 693.06 29.91 650 840 
Language Form Spanish 108 679.4 18.26 650 725 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 
receipt of free or reduced-price lunch (FRL). 
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Table ADD.12.6 Subgroup Performance for Mathematics Scale Scores: Algebra II 
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Group Type Group N Mean SD Min Max 
Full Summative 
Score 5,956 725.07 25.61 650 850 
Female 2,918 725.8 25.06 650 850 
eoneet Male 3,038 724.38 26.12 650 845 
American Indian/Alaska 
Native 238 709.8 14.53 660 750 
Asian 306 = 751.33 32.25 680 845 
Black/African American 876 718.93 22.12 650 786 
Ethnicity Hispanic/Latino 2,139 715.67 21.18 650 811 
Native Hawaiian/Pacific 
Islander n/r n/r n/r n/r n/r 
Two or more races 101 729.97 23.86 669 800 
White 2,234 734.05 24.78 654 850 
Not Economically 
Disadvantaged 3,585 730.5 26.77 650 850 
Economic Status* : 
Economically 
Disadvantaged 2,353 716.89 21.32 650 800 
English Learner Non English Learner 5,482 726.7 25.58 650 850 
Status English Learner 474 706.31 17.1 650 785 
Disabilities Students without Disabilities 4,841 727.93 25.98 650 850 
Students with Disabilities 1,096 712.58 19.65 650 790 
Language Form Spanish 96 702.61 15.55 650 740 


Note: *Economic status was based on participation in National School Lunch Program (NSLP): 


receipt of free or reduced-price lunch (FRL). n/r = not reported due to n<20. 
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Addendum 13: Reliability 


Table ADD.13.1 shows the total group level reliability estimates and raw score SEM for the fall 2018 forms. Tables 
ADD.13.2 — ADD.13.7 show the subgroup reliability estimates and raw score SEM. A minimum sample size of 100 
per core form was required for calculating the reliability estimates for subgroups; therefore, the subgroup totals 
may not equal the total group sample size. Tables ADD.13.8 — ADD.13.10 provide the claim and subclaim reliability 
and raw score SEM estimates for the fall 2018 forms. The paper-based tests did not have sufficient sample sizes for 
reliability analyses. 


Table ADD.13.1 Summary of ELA/L Test Reliability Estimates for Fall 2018 Total Group 


Grade Number Avg. Max. aa Average Minimum Reliability Maximum Reliability 
Level ofForms Possible Score __ Reliability N Alpha N Alpha 
Score SEM 

ELAO9 2 109 5.79 0.93 177 0.87 4,318 0.93 
ELA10 2 109 5.51 0.94 204 0.87 13,359 0.94 
ELA11 2 109 5.30 0.93 175 0.79 9,920 0.93 
ALGO1 2 81 3.45 0.93 13,504 0.93 1,312 0.93 
GEOO1 2 81 3.28 0.92 779 0.91 4,216 0.92 
ALG02 2 81 3.39 0.94 709 0.93 7,238 0.94 
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Table ADD.13.2 Summary of Test Reliability Estimates for Fall 2018 Subgroups: ELA/L Grade 9 


ae Avg. Avg. Minimum Reliability Maximum Reliability 
Sas SEM Reliability N Alpha N Alpha 
Total Group 109 5.79 0.93 177 0.87 4318 0.93 
Gender 
Male 109 5.61 0.93 112 0.87 2,089 0.93 
Female 109 5.99 0.92 2,229 0.92 2,229 0.92 
Ethnicity 
White 109 5.91 0.93 2,007 0.93 2,007 0.93 
Black/African American 109 5.76 0.92 762 0.92 762 0.92 
Asian/Pacific Islander 109 5.7f 0.93 300 0.93 300 0.93 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 109 5.81 0.92 1,048 0.92 1,048 0.92 
Multiple 109 5.74 0.94 126 0.94 126 0.94 
Special Instruction Needs 
Economically Disadvantaged 109 5.70 0.92 1,383 0.92 1,383 0.92 
Not Economically Disadvantaged 109 5.91 0.93 2,935 0.93 2,935 0.93 
English Learner n/r n/r n/r n/r n/r n/r n/r 
Non-English Learner 109 5.80 0.93 177 0.87 4,258 0.93 
Students with Disabilities 109 5.21 0.91 177 0.87 649 0.92 
Students without Disabilities 109 5.92 0.92 3,669 0.92 3,669 0.92 
Students Taking Accommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.65 0.87 177 0.87 177 0.87 


n/r = not reported due to n<100. 
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Table ADD.13.3 Summary of Test Reliability Estimates for Fall 2018 Subgroups: ELA/L Grade 10 


ae Avg. Avg. Minimum Reliability Maximum Reliability 

Seare SEM Reliability N Alpha N Alpha 
Total Group 109 5.51 0.94 204 0.87 13359 0.94 
Gender 
Male 109 5.33 0.93 131 0.86 7,039 0.94 
Female 109 5.71 0.94 6,320 0.94 6,320 0.94 
Ethnicity 
White 109 5.77 0.93 5,863 0.93 5,863 0.93 
Black/African American 109 5.11 0.90 3,883 0.90 3,883 0.90 
Asian/Pacific Islander 109 5.89 0.94 697 0.94 697 0.94 
American Indian/Alaska Native n/r n/r n/r n/r n/r n/r n/r 
Hispanic/Latino 109 5.30 0.92 2,458 0.92 2,458 0.92 
Multiple 109 5.62 0.94 376 0.94 376 0.94 
Special Instruction Needs 
Economically Disadvantaged 109 5.22 0.91 4,439 0.91 4,439 0.91 
Not Economically Disadvantaged 109 5.65 0.94 112 0.87 8,920 0.94 
English Learner 109 4.50 0.82 1,042 0.82 1,042 0.82 
Non-English Learner 109 5.58 0.94 197 0.87 12,317 0.94 
Students with Disabilities 109 4.99 0.92 204 0.87 2,783 0.93 
Students without Disabilities 109 5.65 0.94 10,575 0.94 10,575 0.94 
‘Students Taking Accommodated ss—i—i—‘—sSsSSSSS 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.64 0.84 190 0.84 190 0.84 


n/r = not reported due to n<100. 
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Table ADD.13.4 Summary of Test Reliability Estimates for Fall 2018 Subgroups: ELA/L Grade 11 


ae Avg. Avg. Minimum Reliability Maximum Reliability 
Sas SEM Reliability N Alpha N Alpha 
Total Group 109 5.30 0.93 175 0.79 9,920 0.93 
Gender 
Male 109 4.97 0.92 128 0.82 5,520 0.92 
Female 109 5.70 0.93 4,400 0.93 4,400 0.93 
Ethnicity 
White 109 5.81 0.93 3,217 0.93 3,217 0.93 
Black/African American 109 5.41 0.92 1,099 0.92 1,099 0.92 
Asian/Pacific Islander 109 5.67 0.95 349 0.95 349 0.95 
American Indian/Alaska Native 109 5.07 0.91 509 0.91 509 0.91 
Hispanic/Latino 109 4.85 0.90 4,464 0.90 4,464 0.90 
Multiple 109 5.98 0.93 189 0.93 189 0.93 
Special Instruction Needs 
Economically Disadvantaged 109 4.90 0.91 4,461 0.91 4,461 0.91 
Not Economically Disadvantaged 109 5.64 0.93 5,326 0.93 5,326 0.93 
English Learner 109 4.18 0.77. 1,117 0.77 1,117 0.77 
Non-English Learner 109 5.42 0.93 152 0.80 8,803 0.93 
Students with Disabilities 109 4.39 0.88 175 0.79 2,103 0.89 
Students without Disabilities 109 5.53 0.93 7,757 0.93 7,757 0.93 
Students Taking Accommodated 
Forms 
ASL n/r n/r n/r n/r n/r n/r n/r 
Closed-Caption n/r n/r n/r n/r n/r n/r n/r 
Screen Reader n/r n/r n/r n/r n/r n/r n/r 
Text-to-Speech 109 4.40 0.79 169 0.79 169 0.79 


n/r = not reported due to n<100. 
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Max. 
Raw 
Score 

Total Group 81 
Gender 
Male 81 
Female 81 
Ethnicity 
White 81 
Black/African American 81 
Asian/Pacific Islander 81 
American Indian/Alaska Native n/r 
Hispanic/Latino 81 
Multiple 81 
Special Instruction Needs 
Economically Disadvantaged 81 
Not Economically Disadvantaged 81 
English Learner 81 
Non-English Learner 81 
Students with Disabilities 81 
Students without Disabilities 81 
‘Students Taking Accommodated st” 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 81 
Students Taking Translated Forms 
Spanish Language Form 81 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.45 0.93 
3.40 0.93 
3.50 0.93 
3.62 0.93 
3.11 0.86 
3.71 0.96 
n/r n/r 
3.29 0.89 
3.47 0.93 
3.55 0.93 
2.85 0.86 
3.48 0.93 
3.09 0.89 
3.51 0.93 
3.55 0.93 
n/r n/r 
n/r n/r 
n/r n/r 
3.60 0.93 
2.61 0.56 


February 28, 2020 


N 


13,504 


6,714 
639 


5,569 
245 
722 

n/r 
401 
367 


8,755 
201 
1,111 
392 
920 
8,755 


n/r 
n/r 


n/r 


1,199 


126 


Alpha 


0.93 


0.93 
0.92 


0.93 
0.84 
0.96 

n/r 
0.86 
0.93 


0.93 
0.82 
0.93 
0.85 
0.93 
0.93 


n/r 
n/r 


n/r 


0.93 


0.56 


N 


1,312 


673 
6,790 


555 
3,961 
722 
n/r 
2,786 
367 


814 
958 
12,546 
2,574 
10,926 
814 


n/r 
n/r 


n/r 


1,199 


126 


Alpha 


0.93 


0.94 
0.93 


0.93 
0.87 
0.96 

n/r 
0.90 
0.93 


0.93 
0.87 
0.93 
0.90 
0.93 
0.93 


n/r 
n/r 


n/r 


0.93 


0.56 
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Max. 
Raw 
Score 

Total Group 81 
Gender 
Male 81 
Female 81 
Ethnicity 
White 81 
Black/African American 81 
Asian/Pacific Islander 81 
American Indian/Alaska Native n/r 
Hispanic/Latino 81 
Multiple n/r 
Special Instruction Needs 
Economically Disadvantaged 81 
Not Economically Disadvantaged 81 
English Learner 81 
Non-English Learner 81 
Students with Disabilities 81 
Students without Disabilities 81 
‘Students Taking Accommodated st” 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 81 
Students Taking Translated Forms 
Spanish Language Form n/r 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.28 0.92 
3.24 0.93 
3.31 0.91 
3.50 0.91 
3.06 0.87 
3.94 0.96 
n/r n/r 
2.94 0.86 
n/r n/r 
2.99 0.86 
3.42 0.93 
2.54 0.76 
3.32 0.92 
2.80 0.83 
3.36 0.93 
n/r n/r 
n/r n/r 
n/r n/r 
3.36 0.91 
n/r n/r 


February 28, 2020 


N 


779 


428 
351 


340 
107 
245 
n/r 
1,413 


n/r 


1,596 
509 
239 
650 
223 
556 


n/r 
n/r 
n/r 


689 


n/r 


Alpha 


0.91 


0.92 
0.91 


0.90 
0.86 
0.96 

n/r 
0.85 


n/r 


0.85 
0.91 
0.73 
0.91 
0.81 
0.91 


n/r 
n/r 
n/r 


0.91 


n/r 


N 


4,216 


2,088 
2,128 


1,666 
592 
245 

n/r 
270 


n/r 


269 
2,611 
129 
3,977 
665 
3,540 


n/r 
n/r 
n/r 


689 


n/r 


Alpha 


0.92 


0.93 
0.92 


0.92 
0.88 
0.96 

n/r 
0.92 


n/r 


0.91 
0.93 
0.82 
0.92 
0.84 
0.93 


n/r 
n/r 
n/r 


0.91 


n/r 
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Max. 
Raw 
Score 

Total Group 81 
Gender 
Male 81 
Female 81 
Ethnicity 
White 81 
Black/African American 81 
Asian/Pacific Islander 81 
American Indian/Alaska Native 81 
Hispanic/Latino 81 
Multiple 81 
Special Instruction Needs 
Economically Disadvantaged 81 
Not Economically Disadvantaged 81 
English Learner 81 
Non-English Learner 81 
Students with Disabilities 81 
Students without Disabilities 81 
‘Students Taking Accommodated st” 
Forms 
ASL n/r 
Closed-Caption n/r 
Screen Reader n/r 
Text-to-Speech 81 
Students Taking Translated Forms 
Spanish Language Form 81 


n/r = not reported due to n<100. 


New Meridian 


Avg. Minimum Reliability Maximum Reliability 


Avg. 
SEM Reliability 
3.39 0.94 
3.33 0.95 
3.44 0.93 
3.75 0.94 
3.02 0.88 
4.04 0.96 
2.86 0.76 
2.97 0.84 
3.66 0.94 
2.97 0.85 
3.64 0.95 
2.62 0.74 
3.45 0.94 
2.86 0.88 
3.46 0.94 
n/r n/r 
n/r n/r 
n/r n/r 
3.36 0.91 
3.18 0.93 


February 28, 2020 


N 


709 


407 
302 


317 
751 
408 
404 
248 
177 


2,891 
441 
155 
554 
276 
431 


n/r 
n/r 
n/r 


689 


633 


Alpha 


0.93 


0.93 
0.92 


0.94 
0.88 
0.96 
0.76 
0.80 
0.94 


0.85 
0.93 
0.69 
0.93 
0.85 
0.93 


n/r 
n/r 
n/r 


0.91 


0.93 


N 


7,238 


3,448 
3,790 


2,675 
751 
408 
404 

2,740 
177 


261 
4,163 
546 
6,691 
794 
6,370 


n/r 
n/r 
n/r 


689 


633 


Alpha 


0.94 


0.95 
0.93 


0.94 
0.88 
0.96 
0.76 
0.85 
0.94 


0.88 
0.95 
0.75 
0.94 
0.90 
0.94 


n/r 
n/r 
n/r 


0.91 


0.93 
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Writing: 
ee Reading: Reading: —— pe 2 Writing Knowledge 
Reading: Total iateratirs infarction Reading: Vocabulary Writing: Total Expression Language and 
Conventions 
Grade Sea Average Max Average | Raw’ Average cede Average | Raw’ Average | Raw’ Average | Raw’ Average 
Raw aaa: Ra eas ei Max Raw need ae ea ere 
Level Reliability Reliability | Score Reliability Reliability | Score Reliability | Score Reliability | Score Reliability 
Score Sco Score 
9 64 0.89 24 0.77 24 0.75 16 0.64 45 0.88 36 0.88 9 0.88 
10 64 0.88 24 0.74 24 0.74 16 0.68 45 0.90 36 0.91 9 0.92 
11 64 0.89 24 0.78 28 0.77 12 0.57 45 0.88 36 0.88 9 0.89 
Table ADD.13.9 Average Mathematics Reliability Estimates for Fall 2018 Total Test and Subscores 
Major Content ee Mathematics Reasoning Modeling Practice 
Giade aval Max Raw Average Max Raw Average Max Raw Average Max Raw Average 
Score Reliability Score Reliability Score Reliability Score Reliability 
Al 26 0.81 17 0.63 14 0.74 18 0.76 
GO 30 0.82 19 0.69 14 0.75 18 0.71 
A2 22 0.80 20 0.71 14 0.76 18 0.77 
Note: A1 = Algebra |, GO = Geometry, A2 = Algebra Il, 
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Tables ADD.13.10 and ADD.13.11 provide information about the accuracy and the consistency of two 
classifications made on the basis of the scores on the fall block 2018 English language arts/literacy and 
mathematics assessments, respectively. The columns labeled “Exact level” provide the classification of the student 
into one of five achievement levels. The columns labeled “Level 4 or higher vs. 3 or lower” provide the 
classification of the student as being either in one of the upper two levels (Levels 4 and 5) or in one of the lower 
three levels (Levels 1, 2, and 3). 


Tables ADD.13.12 to ADD.13.17 provide more detailed information about the accuracy and the consistency of the 
classification of students into proficiency levels for each fall block 2018 assessment. Each cell in the 5-by-5 table 
shows the estimated proportion of students who would be classified into a particular combination of proficiency 
levels. The sum of the five bold values on the diagonal should equal the exact level of decision accuracy or 
consistency presented in Tables ADD.13.100r ADD.13.11 for the corresponding assessment. For “Level 4 and 
higher vs. 3 and lower” found in Tables ADD.13.10 or ADD.13.11, the sum of the shaded values in Tables 
ADD.13.12 to ADD.13.17 should equal the level of decision accuracy or consistency for the corresponding 
assessment in ADD.13.10 or ADD.13.11. Note that the sums based on values may not match exactly to the values 
due to truncation and rounding. 


Table ADD.13.10 Reliability of Classification: Summary for ELA/L Fall 2018 


Decision Accuracy: Proportion Decision Consistency: Proportion 

Accurately Classified Consistently Classified 

Level evict Level Level 4 or higher Exact Leveal Level 4 or higher 

vs. 3 or lower vs. 3 or lower 
9 0.77 0.68 0.92 0.89 
10 0.75 0.67 0.95 0.92 
11 0.76 0.68 0.94 0.92 


Table ADD.13.11 Reliability of Classification: Summary for Mathematics Fall 2018 


Decision Accuracy: Proportion Decision Consistency: Proportion 

Accurately Classified Consistently Classified 

bevel Exact Level Level 4 or higher Exact Level Level 4 or higher 
vs. 3 or lower vs. 3 or lower 

Al 0.78 0.70 0.95 0.93 
GO 0.79 0.69 0.95 0.93 
A2 0.80 0.73 0.96 0.94 


Note: A1 = Algebra |, GO = Geometry, A2 = Algebra Il. 


New Meridian February 28, 2020 Page 371 


Table ADD.13.12 Reliability of Classification: Grade 9 ELA/L Fall 2018 


2019 Technical Report 


Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Category 
Total 
Scale Score 
650-699 0.09 0.01 0.00 0.00 0.00 0.11 
700-724 0.02 0.12 0.03 0.00 0.00 0.17 
Decision Accuracy 725-749 0.00 0.03 0.18 0.04 0.00 0.26 
750-809 0.00 0.00 0.04 0.30 0.03 0.37 
810-850 0.00 0.00 0.00 0.02 0.07 0.09 
650-699 0.09 0.02 0.00 0.00 0.00 0.11 
700-724 0.03 0.10 0.05 0.00 0.00 0.18 
Decision Consistency 725-749 0.00 0.04 0.15 0.06 0.00 0.25 
750-809 0.00 0.00 0.06 0.27 0.03 0.36 
810-850 0.00 0.00 0.00 0.03 0.07 0.10 
Table ADD.13.13 Reliability of Classification: Grade 10 ELA/L Fall 2018 
Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Calegery 
Total 
Scale Score 
650-699 0.34 0.05 0.00 0.00 0.00 0.39 
700-724 0.03 0.14 0.04 0.00 0.00 0.22 
Decision Accuracy 725-749 0.00 0.04 0.10 0.03 0.00 0.17 
750-809 0.00 0.00 0.03 0.11 0.02 0.16 
810-850 0.00 0.00 0.00 0.01 0.06 0.07 
650-699 0.33 0.06 0.01 0.00 0.00 0.40 
700-724 0.04 0.11 0.04 0.00 0.00 0.20 
Decision Consistency 725-749 0.00 0.05 0.08 0.03 0.00 0.16 
750-809 0.00 0.01 0.04 0.10 0.02 0.16 
810-850 0.00 0.00 0.00 0.02 0.06 0.08 
Table ADD.13.14 Reliability of Classification: Grade 11 ELA/L Fall 2018 
Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Categeny 
Total 
Scale Score 
650-699 0.31 0.04 0.00 0.00 0.00 0.35 
700-724 0.04 0.16 0.04 0.00 0.00 0.25 
Decision Accuracy 725-749 0.00 0.04 0.13 0.03 0.00 0.19 
750-809 0.00 0.00 0.03 0.13 0.01 0.17 
810-850 0.00 0.00 0.00 0.01 0.03 0.04 
650-699 0.30 0.05 0.00 0.00 0.00 0.36 
700-724 0.05 0.13 0.05 0.00 0.00 0.24 
Decision Consistency 725-749 0.00 0.05 0.10 0.03 0.00 0.19 
750-809 0.00 0.00 0.04 0.11 0.01 0.17 
810-850 0.00 0.00 0.00 0.01 0.03 0.05 
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Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Categnry 
Total 
Scale Score 
650-699 0.17 0.03 0.00 0.00 0.00 0.21 
700-724 0.04 0.25 0.05 0.00 0.00 0.33 
Decision Accuracy 725-749 0.00 0.04 0.20 0.03 0.00 0.27 
750-809 0.00 0.00 0.03 0.15 0.00 0.18 
810-850 0.00 0.00 0.00 0.00 0.01 0.01 
650-699 0.16 0.05 0.00 0.00 0.00 0.22 
700-724 0.05 0.21 0.06 0.00 0.00 0.32 
Decision Consistency 725-749 0.00 0.06 0.17 0.03 0.00 0.26 
750-809 0.00 0.00 0.04 0.14 0.01 0.19 
810-850 0.00 0.00 0.00 0.00 0.01 0.01 
Table ADD.13.16 Reliability of Classification: Geometry Fall 2018 
Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Category 
Total 
Scale Score 
650-699 0.09 0.04 0.00 0.00 0.00 0.12 
700-724 0.03 0.37 0.04 0.00 0.00 0.43 
Decision Accuracy 725-749 0.00 0.05 0.19 0.03 0.00 0.28 
750-809 0.00 0.00 0.02 0.12 0.01 0.15 
810-850 0.00 0.00 0.00 0.00 0.02 0.02 
650-699 0.08 0.06 0.00 0.00 0.00 0.14 
700-724 0.03 0.32 0.05 0.00 0.00 0.41 
Decision Consistency 725-749 0.00 0.07 0.17 0.04 0.00 0.28 
750-809 0.00 0.00 0.03 0.11 0.01 0.15 
810-850 0.00 0.00 0.00 0.01 0.02 0.02 
Table ADD.13.17 Reliability of Classification: Algebra II Fall 2018 
Full 
Summative Level 1 Level 2 Level 3 Level 4 Level 5 Category 
Total 
Scale Score 
650-699 0.39 0.05 0.00 0.00 0.00 0.45 
700-724 0.03 0.18 0.03 0.00 0.00 0.24 
Decision Accuracy 725-749 0.00 0.04 0.08 0.02 0.00 0.14 
750-809 0.00 0.00 0.02 0.12 0.01 0.15 
810-850 0.00 0.00 0.00 0.00 0.02 0.02 
650-699 0.38 0.07 0.00 0.00 0.00 0.45 
700-724 0.05 0.15 0.04 0.00 0.00 0.23 
Decision Consistency 725-749 0.00 0.05 0.07 0.02 0.00 0.14 
750-809 0.00 0.00 0.03 0.11 0.01 0.15 
810-850 0.00 0.00 0.00 0.01 0.02 0.02 
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The intercorrelations for the fall 2018 assessments are presented in Tables ADD.14.1 through ADD.14.3 for ELA/L 
grades 9, 10, and 11 and Tables ADD.14.4 though ADD.14.6 for the traditional mathematics courses (A1, GO, A2). 
Like the spring intercorrelations, the ELA/L all have moderate to high values with the writing subclaims being highly 
intercorrelated. The mathematics intercorrelations have moderate values. Tables ADD.14.7 through ADD.14.9 are 
the correlations between ELA/L and mathematics from the fall block. The shaded values along the diagonal are the 
reliabilities as reported in Addendum 13. The average intercorrelations are provided in the lower portion of the 


table and the total sample sizes are provided in the upper portion of the table. 


Table ADD.14.1 Average Intercorrelations and Reliability between Grade 9 ELA/L Subclaims 


RD RL RI RV WR WE WKL 
RD 0.89 4,630 4,630 4,630 4,630 4,630 4,630 
RL 0.92 0.77 4,630 4,630 4,630 4,630 4,630 
RI 0.91 0.73 0.75 4,630 4,630 4,630 4,630 
RV 0.85 0.68 0.67 0.64 4,630 4,630 4,630 
WR 0.75 0.72 0.71 0.54 0.88 4,630 4,630 
WE 0.74 0.72 0.70 0.53 1 0.88 4,630 
WKL 0.74 0.72 0.70 0.53 0.98 0.96 0.88 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE 
= Written Expression, and WKL = Writing Knowledge and Conventions. 


Table ADD.14.2 Average Intercorrelations and Reliability between Grade 10 ELA/L Subclaims 


RD 
RL 
RI 
RV 
WR 
WE 
WKL 


RD 
0.88 
0.92 
0.90 
0.87 
0.81 
0.81 
0.81 


RL 


15,277 
0.74 
0.74 
0.72 
O77 
0.76 
0.76 


RI 
15,277 
15,277 

0.74 
0.68 
0.77 
0.77 
0.77 


RV 
15,277 
15,277 
15,277 

0.68 
0.64 
0.63 
0.64 


WR 
15,277 
15,277 
15,277 
15,277 

0.90 
1.00 
0.98 


WE 
15,277 
15,277 
15,277 
15,277 
15,277 

0.91 
0.97 


WKL 
15,277 
15,277 
15,277 
15,277 
15,277 
15,277 

0.92 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE 
= Written Expression, and WKL = Writing Knowledge and Conventions. 
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Table ADD.14.3 Average Intercorrelations and Reliability between Grade 11 ELA/L Subclaims 


RD RL RI RV WR WE WKL 
RD 0.89 11,086 11,086 11,086 11,086 11,086 11,086 
RL 0.92 0.78 11,086 11,086 11,086 11,086 11,086 
RI 0.93 0.76 0.77 11,086 11,086 11,086 11,086 
RV 0.79 064 0.63 0.57 11,086 11,086 11,086 
WR 0.77. 0.73 0.75 #052 0.88 11,086 11,086 
WE 0.77. 073 40.75 O52 £441.00 £0.88 11,086 
WKL 0.76 O72 40.74 #21052 #49098 0.97 0.89 


Note: RD = Reading, RL = Reading Literature, RI = Reading Information, RV = Reading Vocabulary, WR = Writing, WE 
= Written Expression, and WKL = Writing Knowledge and Conventions. 


Table ADD.14.4 Average Intercorrelations and Reliability between Algebra I Subclaims 


Mathematics 
MC ASC MR MP 


MC 0.81 16,282 16,282 16,282 
ASC 0.78 0.62 16,282 16,282 
MR 0.75 0.69 0.74 16,282 
MP 0.78 0.70 0.72 0.76 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table ADD.14.5 Average Intercorrelations and Reliability between Geometry Subclaims 


Mathematics 
MC ASC MR MP 


MC 0.82 5,505 5,505 5,505 
ASC 0.76 0.69 5,505 5,505 
MR 0.72 0.67 0.75 5,505 
MP 0.75 0.68 0.78 0.71 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 


Table ADD.14.6 Average Intercorrelations and Reliability between Algebra II Subclaims 


Mathematics 
MC ASC MR MP 


MC 0.80 9,138 9,138 9,138 
ASC 0.76 0.71 9,138 9,138 
MR 0.76 0.73 0.76 9,138 
MP 0.79 O77 0.79 0.77 


Note: MC = Major Content, ASC = Additional and Supporting Content, MR = Mathematical Reasoning, and MP = 
Modeling Practice. 
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Table ADD.14.7 Average Correlations between ELA/L and Mathematics for High School 


Mathematics 


ELAIL 
At GO A2 
9 0.72 0.77 
(1,051) (366) 
‘5 0.57 0.67 0.76 
(6,419) (935) (698) 
44 0.65 0.41 0.55 


(208) (1,268) (3,588) 
Note: ELA/L = English language arts/literacy, A1 = Algebra |, GO = Geometry, A2 = Algebra II. The correlations are 
provided with the sample sizes, below in parentheses. 


Table ADD.14.8 Average Correlations between Reading and Mathematics for High School 


Mathematics 


RD 
At GO A2 
9 0.70 0.76 
(1,051) (366) 
10 0.55 0.66 0.75 
(6,419) (935) — (698) 
41 0.67 0.43 0.56 


(208) (1,268) (3,588) 
Note: RD = Reading, A1 = Algebra |, GO = Geometry, A2 = Algebra Il. The correlations are provided with the sample 
sizes, below in parentheses. 


Table ADD.14.9 Average Correlations between Writing and Mathematics for High School 


Mathematics 


WR 
At GO A2 
9 0.62 0.67 
(1,051) (366) 
40 0.46 0.58 0.69 
(6,419) (935) (698) 
41 0.55 0.29 0.43 


(208) (1,268) (3,588) 
Note: WR = Writing, A1 = Algebra |, GO = Geometry, A2 = Algebra II. The average correlations are provided with the 
sample sizes, below in parentheses. 
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